Controlled Vocabulary

Looking for the Controlled Vocabulary Keyword Catalog?

The Top 12 Myths about Embedded Photo Metadata


There are a number of myths or misconceptions that surround the practice of embedding information such as IPTC, IPTC-IIM, XMP or even Exif into a digital image file — like JPEGs, TIFFs, Photoshop, DNG and other Raw files). There are a number of applications or utilities which can do this easily and safely, but first, let's take a look at the list.

  1. Embedded photo metadata is something that is hard to read unless you have Photoshop or some other professional software application.
  2. If I embed my copyright (or caption or keywords, or other metadata field) into my images it will always be there.
  3. Embedding photo metadata adds a lot of disk space overhead to an image file.
  4. If I embed photo metadata in the images I place on my website, then search engines will be able to spider them and use the caption and other information to rate/rank my images.
  5. Images that I upload to my social media or photo sharing sites will still retain my embedded photo metadata.
  6. Removing embedded photo metadata is against the law.
  7. All embedded metadata is the same.
  8. Picasa (or iPhoto) writes all my captions into my embedded metadata as soon as I enter them.
  9. Adding copyright and contact information to images on the Internet makes websites load slowly.
  10. Adding photo metadata, like copyright and contact information, is difficult to do and time consuming.
  11. Metadata is always stored inside the image file (OR Metadata is always stored outside the image file).
  12. The topic of metadata is beyond the competence of the everyday user.

Let me stress, if it wasn't clear before, that the above statements are common misunderstandings about photo metadata, meaning that they are not correct. If you want to understand why, read on for a short summary of each, as well as links to other resources that should help you understand.

An illustration of the "Metadata" overhead discussed in item 3 below.
Both images below were saved using Adobe Photoshop's "Save for Web & Devices" feature, using a 60 percent quality setting, saved as a JPEG , with ICC profile embedded.

Using All Metadata setting - "on disk" size is 39.2 kb
View metadata list for drp2091169-sfw-q60-all-wicc.jpg
Using Copyright Metadata setting - "on disk" size is 29.4 kb
View metadata list for drp2091169-sfw-q60-copyright.jpg

By purging nearly all metadata with the exception of the copyright notice, you can save 9.8 kb in this specific instance. Check out the lists above to see just how much data is being stored in each image, and what additional information can be stored in that extra 9.8 kb. The same image saved using the None setting in Save for Web, takes up 27.7 kb on disk; so adding the Copyright Notice metadata alone only adds 1.7 kb of data above that used by the image and the ICC profile in this instance. Keep in mind that Photoshop's "Save for Web & Devices" only stores the metadata in the XMP format, and includes more information that the type stored using the legacy IPTC-IIM format.

1. Embedded metadata is hard to read
There are many specific imaging applications and utilities, such as Adobe Photoshop, Bridge, Lightroom, Expression Media, Photo Mechanic and others that make it easy to both read and enter various forms of photo metadata into your images. Reading metadata that is contained in images is much easier than embedding, and can be done with a number of free utilities. Some common utilities for Mac OS X, are Apple Preview and Spotlight. On the Windows platform you can use IrfanView, or Microsoft Pro Photo Tools; and those using Windows 7 will find a basic set of image data available directly in Windows Explorer. It's even possible to read the metadata from images on the web using an online service -- like the one built by Jeffery Friedl that leverages Phil Harvey's ExifTool -- which can show you all sorts of information in your image files, including GPS. This particular tool can even be installed in the toolbar of many popular internet browsers, so that revealing this information is only a one-click operation.

2. Embedded metadata will always be there
Unfortunately, there is no way to "lock" your embedded photo metadata into your images. The closest you will find to a locking mechanism is in the METAmachine application, which prevents the user from changing Creator and Copy Notice fields if there is a previous entry. The embedded metadata in a digital image is fragile and some applications either don't know, or respect the work that was done to store this information along with the image pixels. In some cases, simply uploading an image to a website, or having it processed online to a different size might result in a partial or total loss of metadata (see item 5 below).

3. Embedded metadata adds a lot of disk space overhead
The amount of disk space needed to hold a reasonable amount of information about an image takes up surprisingly little space, as it's mostly plain text. Various outfits that sell or give away software to "strip" your images of metadata so they will be "leaner and meaner" on the Internet like to perpetuate this meme; claiming that the addition of embedded photo metadata adds a large amount of "overhead" to a file. In reality, unless you are filling in every single metadata field in Photoshop CS5 or Lightroom 3 (which include the IPTC Extension), the additional disk space required to store that data in your original high-resolution files will be a very tiny fraction of the space compared to the space used by the pixels in your image. In most cases, adding basic copyright, creator, contact info and a three or four sentence caption will only add about 2 to 4 kb to your file size. For a 20 or 30 mb TIFF file that's infinitesimal by comparison. See the illustration above for one example of what to expect for much smaller images being saved for use on the web.

4. Embedded metadata can be read by the search engines
There is no evidence, from the tests I've conducted, or from other reports I've seen, that would lead to me to believe that the various embedded metadata schemas (IPTC, XMP, or Exif) are being read or used by the major search engines (i.e. Google, Bing, and Yahoo). It is possible that they may be reading this information, but so far there is no reason to assume that it is being used as part of their ranking algorithm.That doesn't mean that embedding metadata is not a good practice. It's just at this point in time that it has little value for those wishing to enhance their Search Engine Optimization (SEO). If you are interested in more details on this issue, see the Why Embedded Photo Metadata Won't Help Your SEO (at least without some help) article.

5. Images uploaded to social media/photo sharing sites will retain my embedded metadata
As of late 2010, over half of the various social media or photo sharing sites either remove all embedded metadata on upload, or remove it from images that are processed to intermediate preview and thumbnail images. For details on that issue, see the Controlled Vocabulary Survey regarding the Preservation of Photo Metadata by Social Media Websites. Users need to test and verify their online services to ensure that metadata is preserved. If not, they need to ask their services why they are not preserving their photo metadata. As stressed in the Metadata Manifesto, "systems need to preserve ownership metadata by default and discourage removal of other metadata by warning users about the legal implications of removal."

6. Removing embedded metadata is against the law
This is a tricky subject. There are lots of different "fields" within the various embedded metadata types or schemas mentioned above. If you are the owner of the image, it's up to you what to include or edit. If the image is one you are managing for someone else, or simply using; then you should be careful in what you edit, and/or remove. Some of the fields, such as the Copyright Notice, Source, Creator, and Contact Info, comprise what is referred to as "Copyright Management Information" and removal of these is against the law in the United States under the Digital Millenium Copyright Act (DMCA). Other jurisdictions may have similar laws, so you might want to check with an Intellectual Property attorney before making changes to embedded metadata in images that don't belong to you. Removal of other fields, such as Caption/Description, Title, Headline, or Keywords, in a digital image file isn't necessarily going to land you in hot water of a legal type, but it will make it harder for those that may be legitimately using a digital file to find it, or know what is going on in the image.

7. All embedded metadata is the same
There are actually many different types of photo metadata which peacefully co-exist (for the most part) in your digital images. They make up an alphabet soup of acronyms such as IPTC, IPTC-IIM, XMP or even Exif. Some of these, like Exif, are auto-generated; while the rest are mostly "user-entered" (though that process can be done in batch-mode operations to hundreds or thousands of images at a time). Some metadata, like IPTC-IIM is stored in a binary form, while others, like XMP is written in a form more similar to the HTML of this web page. Thus it may be possible to have the name of the photographer stored three times in the same image: in Exif, in IPTC-IIM and in XMP (IPTC Core) — are these the same even if the data entered is the same. There are a few fields that are "shared" between the different schemas, so you could say that those are the same; but this is only the case for a few (the IPTC Core schema shares a few fields with Dublin Core and the IPTC Extension schema shares a couple of fields with PLUS, see the Metadata Field Guide if you want to know which). Member companies of the Metadata Working Group are working to make sure that the information in these various schema can easily interoperate regardless of where and how they are stored.

8. Picasa (or iPhoto) writes all my captions and keywords into my embedded metadata as soon as I enter them
The simple truth is that whether or not this happens depends a lot on the file format and the program used. While Picasa is a very useful program (and one I recommend to most of my family members), it is designed to work primarily with JPEG images. Version 3.8 of Picasa can read a number of the various metadata fields in JPEG images, and even a few fields in TIFF files, but it can only write your captions and keywords to JPEGs at present. All the versions of iPhoto I've looked at up till now do not enter your caption info into the image when you write the caption; that is only done at the time when you "export" the image, and only to specific file formats. If you aren't sure what is being done and care to verify, you can check the files after you add info (and before you export) by using the online tool built by Jeffery Friedl that leverages Phil Harvey's ExifTool.

9. Adding copyright and contact information to images on the Internet makes websites load slowly
Metadata does take up some storage space in a digital file. However, when compared to the image pixels, it's generally quite small for most high-resolution images from digital SLR cameras. When you make the image sizes smaller, the space occupied by the pixels may shrink dramatically, while the space to store the metadata does not change at all (unless you opt to remove some of the information). If you are displaying a number of small thumbnails (say, less than 150 pixels on the long dimension for instance), and each of those thumbnails has the full set of metadata you had embedded in your original high resolution image, then it could be that the metadata takes up more space than the pixels. So if you had a page of say 500~1000 thumbnails on a page, the addition of that embedded photo metadata could increase the overall load time of the page.

However, that same amount of metadata in a 600 pixel wide preview might only increase that file size by 1 or 2 percent — which really isn't nearly as big of a deal as the lean and mean purists would have you believe. So if you are concerned about the speed of your website, it may be worth testing before removing all metadata from your thumbnails, or preview images. The difficulty at present with that idea, is that the tools used to manage and remove metadata are very simple and typically don't offer fine-grained solutions that would allow you to easily remove specific metadata fields while leaving others. In addition, removal of critical fields — like the Copyright Notice, or Creator, or Contact Info fields — will make it difficult if not impossible for others to know where that image came from; once it's removed from its original location, or downloaded from the Internet. If some version of an Orphan Works bill passes in the near future, you may be wishing you hadn't pared your images down by removing all metadata; especially if you see others using your images without your permission.

10. Adding embedded photo metadata, like copyright and contact information, is difficult to do and time consuming
Whether or not adding embedded photo metadata is painful has everything to do with the application you use. If you are using a free application/utility that doesn't allow for storing values, or sets of values, in a template — and instead requires you to type in each entry one character at a time — then it will be time consuming. However, there are a number of professional applications that make easy to save repetive information, like your name, copyright notice, contact info, etc., and allow you to save these values into metadata templates and even apply this infomation in batch-operations. See the various "Meta-tutorials" on the PhotoMetadata site to see how easily this can be done. Some of the tutorials even have video versions if you want to take a break from reading. If you are comfortable with "Command-Line" applications there are free utilities such as ExifTool that can add, or modify the metadata in a batch of files in a very short time. The International Press Telecommunications Council (IPTC) has posted a list of various Software Applications that support the IPTC-IIM, IPTC Core and IPTC Extension metadata schemas that is worth investigating as well.

The real reason to take the time to embed photo metadata — especially copyright and contact info — is that it provides a "trail of breadcrumbs" for tracking down the source of an image. Placing a credit line or other type of ownership information below an image on a web page is all well and good. However, as soon as someone "right-clicks & downloads the image" that contextual information surrounding the image on the page is gone and lost forever. Metadata that is embedded in the image can travel along with the image, regardless of where it goes.

11. Metadata is always stored inside the image file (OR Metadata is always stored outside the image file)
There is no set answer to this question, so it's important to understand what the application you are using does with your information and where it's stored. First of all, it's not always possible for the information to be stored inside the image, as not all image file formats support the embedding of metadata. JPEG, TIFF, Photoshop (PSD), and Digital Negatives (DNG) do allow for the embedding of metadata and are widely supported. Proprietary RAW file formats may allow you to embed metadata, but not all applications can or will do this. A number, such as those in the Adobe Creative Suite and Lightroom, will create a small text file that has the same name as the image file, but with a .XMP extension. Other applications — when instructed — will write the data out to some other kind of text file. This is what Apple's Final Cut Server does; as it saves information from it's database in an XML form to be saved with the image or video in a companion text file.

Software applications may reference the information in the image, or they might store it in their own internal database. Image browsers such as Adobe Bridge, Photo Mechanic, Breeze Browser and FotoStation immediately write your metadata to the image and have to read it from the image when you search (though some browsers, like Bridge, can "cache" the information locally). Browsers show you the images that are in a specific folder at that time; but have limited use once the drive or media on which they are located is no longer connected or accessible.

Contrast that with Image cataloging programs which know where your images are located when they are fed into the application. Many cataloging applications, such as Apple Aperture, Adobe Lightroom, Phase One Expression Media, Canto Cumulus, or Extension Portfolio will read in existing metadata from a digital image file and store it in their own local database. Any information (metadata) you enter will be stored in that local database, along with a note on the path to where the original file was first encountered. That is why all of these applications recommend that you do not move the files using other means outside that program itself. Some of these applications will allow you to synchronize the information in the internal database with the original file (and some like Lightroom can be set to do this automatically). Some will only add the metadata at the time that you export a version of that file to share (and only if that file format supports embedded metadata). In some instances (such as with video, or other formats which don't support embedded metadata), it is appropriate that the metadata is only stored in the database; as there is no way to embed it in the file. However, storing information in an internal database is by no means the only way.

12. The topic of metadata is beyond the understanding of the everyday user
If you've made it this far, then you already know and understand a lot more about metadata than many photographers or image users. If you make images with a digital camera, then you probably have learned that a better understanding of embedded photo metatdata will make it easier for you to store, find and share your images — now and in the future. If you'd like to know more, please take a look at some of the other sections of this website, and be sure to visit Photometadata.org.

Many thanks to Richard Wagner, Bob Stromberg, and others from the Controlled Vocabulary forum who contributed ideas for this article.

<<Return to Blog article index

Initial posting: November 18, 2010

 

examples  |  books  |  products  |  image databases  |  links  |  what's new
imagedatabases  |  programs  |  IPTC standard  |  downsampling  | filenaming 
metalogging  |  captioning  |  keywording  |  guidelines  | metalog resources
home  |  contact  | sitemap