XMP Tools

Introduction

XMP Tools is a Python library that provides basic XMP support for RDFLib, including parsing, modification, and serialization. XMP is Adobe’s metadata format, based on RDF. Trivially, XMP metadata is RDF serialized as RDF/XML, “wrapped” within a special XML element.

Adobe’s XMP documentation can be found here.

The parser and the serializer are implemented as RDFLib plugins. Because of limited extensibility of RDFLib, we have copied some methods from RDFLib and modified them. The plugins register themselves as format="xmp". Normally, you do not have to know this, as we provide convenience functionality for reading and writing XMP (see below).

Installation

XMP Tools is available on PyPI as the package xmptools. It can be installed simply via

pip install xmptools

It can also be installed directly from, say, PyCharm: PyCharm package installation diaglog Note that xmptools now depends on the package rdfhelpers.

Usage Examples

To read an XMP sidecar file:

from xmptools import XMPMetadata
xmp = XMPMetadata("file:///foo/bar/baz.xmp")

To write it back after modification:

xmp.write()

To write it back somewhere else:

xmp.write("file:///foo/bar/bazbaz.xmp")

If you want to serialize in some other format, use the mechanisms that RDFLib offers:

import sys
xmp.serialize(destination=sys.stdout.buffer, format="turtle")

A potential problem (from the RDF standpoint) with XMP sidecar files is that the included metadata statements are about the sidecar file, not the actual image file. If you want a graph where the metadata statements are about the image file, do this:

xmp_new = xmp.adjustImageURI()

To read the metadata from an existing image:

xmp, path = XMPMetadata.fromFile("./tests/images/testtiff.tif")

Note that you must first run the unit tests first to generate the file test.xmp because the examples in this documentation use that as their input. More complete scenarios are available in docs/examples/.

Add one new dc:subject keyword to an XMP file:

from xmptools import XMPMetadata, makeFileURI, DC

def addDCSubject(xmp_file, new_subject):
    # read in metadata
    xmp = XMPMetadata(makeFileURI(xmp_file))
    # fetch existing dc:subject keywords
    keywords = xmp.getContainerItems(DC.subject)
    if new_subject not in keywords:
        # set the new list of keywords
        xmp.setContainerItems(DC.subject, keywords + [new_subject])
        # set the new modification date
        xmp.setDate()
        # write the metadata back to the file
        xmp.write()
        print("New XMP file written")
    else:
        print("Keyword already included, no file written")

addDCSubject("../tests/test.xmp", "Boeing 737")

Use a SPARQL query to transform XMP metadata into a simpler graph and serialize it:

  1. Transform dc:subject and dc:creator from containers to repeated properties.

  2. Construct a URI for the image file, based on the URI of the XMP sidecar file, and use it as the subject of all the statements.

from xmptools import XMPMetadata, makeFileURI
from rdflib import Graph
import sys

def queryAndTransform(xmp_file):
    # read in metadata
    xmp = XMPMetadata(makeFileURI(xmp_file))
    # use a SPARQL CONSTRUCT query to build a new graph
    results = xmp.query("""
        PREFIX dc: <http://purl.org/dc/elements/1.1/>
        PREFIX Iptc4xmpCore: <http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/>
        PREFIX crs: <http://ns.adobe.com/camera-raw-settings/1.0/>
        CONSTRUCT { ?file dc:subject ?subject ; dc:creator ?creator ; Iptc4xmpCore:Location ?loc }
        WHERE {
            # get the image file name
            ?xmp crs:RawFileName ?raw .
            # extract dc:subject and dc:creator from containers and make them repeated properties
            OPTIONAL { ?xmp dc:subject [ a rdf:Bag ; !a ?subject ] . }
            OPTIONAL { ?xmp dc:creator [ a rdf:Seq ; !a ?creator ] . }
            # extract location
            OPTIONAL { ?xmp Iptc4xmpCore:Location ?loc . }
            # find the filename extension of the original image file
            BIND (STRAFTER(STR(?raw), ".") AS ?ext)
            # construct a new URI using the image file extension
            # note that we make the tacit assumption that the metadata file has extension "xmp" (length=3)
            BIND (IRI(CONCAT(SUBSTR(STR(?xmp), 1, STRLEN(STR(?xmp))-3), ?ext)) AS ?file)
        }""")
    graph = Graph()
    # insert query results into a new Graph instance
    for statement in results:
        graph.add(statement)
    return graph

g = queryAndTransform("../tests/test.xmp")
g.serialize(destination=sys.stdout.buffer, format="turtle")

XMP Tools API

Namespaces

xmptools.XMP

An rdflib.Namespace instance for the standard xmp: namespace.

xmptools.EXIF, xmptools.CRS, xmptools.DC, xmptools.DCT, xmptools.PHOTOSHOP

These namespaces are provided as a convenience as they are often needed in XMP metadata manipulation. More predefined namespaces are available in RDFLib.

Classes

class xmptools.XMPMetadata(path=)

This is the main “entry point” to the functionality XMP Tools offers. It is a sublcass ofrdflib.Graph and thus you can easily use it wherever you can use RDFLib graphs.

The parameter path defaults to None; if a different value is passed, we assume it points to an XMP sidecar file and we try to read and parse the contents. If path is None, an empty graph is created. If a path was passed, initializes self.url with an URL (an rdflib.URIRef instance) corresponding to path. This URL is considered to be the resource the contained metadata statements are about; note that in case the metadata is read from a sidecar file, the statements are (perhaps a bit confusingly) about the sidecar file, not about the corresponding image file. This is just the way Adobe XMP works, it is not like they really understood the idea of RDF particularly well. The method adjustImageURI() can be used to mitigate this problem.

@classmethod
XMPMetadata.fromFile(path, ignore_sidecar_if_pdf=)

The most “general” way of acquiring metadata, using the following logic:

  1. Assume path points to an image/PDF file, create a corresponding XMP sidecar path and attempt to read its contents. Ignore this step is the file is a PDF file (as identified by its file extension) and ignore_sidecar_if_pdf is True (the default).

  2. Failing #1, attempt to read metadata directly from the image/PDF file.

Returns an XMPMetadata instance and the path from where the metadata was actually read from (either the file provided or its metadata sidecar), or (None, None) if all attempts failed.

@classmethod
XMPMetadata.fromImageFile(path)

Reads metadata embedded in an image file. Supports JPEG, TIFF, and DNG formats. Returns an XMPMetadata instance and path of the actual file from which the metadata was read. This could be either file provided or its sidecar file.

@classmethod
XMPMetadata.fromJPEG(path)

Reads metadata embedded in a JPEG file. Verifies that the file indeed is in JPEG format. Returns an XMPMetadata instance.

XMPMetadata.read()

Reads and parses the metadata from the file that self.url points to, assumes the XMP format.

XMPMetadata.write(destination=)

Writes the metadata into the file pointed to by destination (a URL). If destination is None (the default), uses the value of self.url instead.

XMPMetadata.getDate(predicate=)

Reads the specified predicate (which defaults to xmp:MetadataDate) and returns it as a datetime instance (even if the literal it finds was just “plain”). If no value is found, returns None. Note that xmptools uses its own ISO 8601 parser, thanks to the parser datetime.datetime.fromisoformat failing for many genuine Adobe XMP timestamps.

XMPMetadata.setDate(timestamp=, predicate=)

Writes a timestamp into the specified predicate. Replaces any existing triple in the graph. The timestamp parameter must be a datetime instance or an ISO 8601 -formatted string, and defaults to the current time (i.e., the value of datetime.utcnow()). This method can be used when metadata is modified and is written back to the sidecar file, but it is not called automatically. The parameter predicate defaults to XMP.MetadataDate.

XMPMetadata.findDateCreated(predicates=, error=)

Tries to find when the image was created by trying, in order, different time properties (parameter predicates, defaults to a sequence of xmp:CreateDate, exif:DateTimeOriginal, photoshop:DateCreated). Returns two values: the date (typically a datetime instance, but possibly a date instance) and the successful predicate used. If no date is found, raises an error. The parameter error is used to provide the exception class (it defaults to DataNotFound); if None is passed for error, returns None, None.

@classmethod
XMPMetadata.fileModificationDate(path)

Given an image file path, find the latest modification date of the image: this could be either the modification date of the image file itself, or the modification date of the corresponding XMP sidecar file. If the path designates the sidecar file specifically, only that file is considered.

XMPMetadata.getContainerItems(predicate)

Reads the value of the specified predicate and, assuming its value is an RDF container, returns a list of the container’s values (as strings, further assuming that the values were all literals). This method is useful with properties such as dc:subject or dc:creator.

XMPMetadata.setContainerItems(predicate, values, newtype=)

Sets the values (the “items”) of an RDF container to literals that correspond to the strings in the list values. If values is None or an empty list, removes the container and the linking predicate altogether. If no container exists prior, creates a new one (using newtype as the container’s RDF type - it defaults to RDF.Seq) and links it.

XMPMetadata.container2repeated(self, predicate, new_predicate=, value_mapper=, remove_predicate=, target_graph=)

Transforms items of a container (e.g., an rdf:Seq) to repeated properties. The container is assumed to be the value of the property predicate, and unless new_predicate is specified, the same predicate is used for the transformed values. The parameter value_mapper defaults to identity (i.e., a no-op), but can be used to transform the container items to something else. If remove_predicate is True, the old predicate and its value are removed; it defaults to False.

If remove_predicate is not False (which is the default), the method removes the original predicate. If target_graph is specified (it defaults to self), the new data is inserted there. The value of target_graph is returned as the value of this method.

This method can be used to transform, say, dc:subject to a more conveniently accessible predicate; value mapping allows the string values to be transformed to, say, SKOS concepts. Here is a simple example:

[] dc:subject [a rdf:Seq ;
               rdf:_1 "Cat" ;
               rdf:_2 "Dog"] .

would be transformed to

[] ex:hasTag "Cat" ;
   ex:hasTag "Dog" ;
   dc:subject [a rdf:Seq ;
               rdf:_1 "Cat" ;
               rdf:_2 "Dog"] .

by calling container2repeated(DC.subject, new_predicate=EX.hasTag). Note that you probably should not be using this if the order in the original container is significant.

repeated2container(self, predicate, new_predicate=, newtype=, value_mapper=, remove_predicate=, source_graph=)

This method effectively “reverses” the operation of container2repeated. The value of newtype (it defaults to RDF.Seq) gives the type of the new container created. Data is read from the graph given as source_graph, it defaults to self.

XMPMetadata.adjustImageURI(new_extension=)

Modifies the graph so that its statements are about the image file, not about the sidecar file. The parameter new_extension should be the filename extension of the image file; the image URI is constructed by taking the existing sidecar URI and substituting the new extension. If new_extension is not specified, the extension of the filename specified in crs:RawFileName (in the metadata) is used; if crs:RawFileName is not present, an error is raised.

XMPMetadata.cbd(resource=, target=)

Computes the Concise Bounded Description of this XMP metadata, using xmptools.cbd() (see below). Unless some additional data has been added to this XMP instance, the CBD should be an identical graph. The parameter resource is the node for which the CBD is computed;it defaults to self.url. The parameter target is a graph into which the CBD is inserted; it defaults to a new instance of rdflib.Graph.

If you want to produce an XMPMetadata instance from a larger graph (say, a graph into which you have inserted XMP metadata from multiple images), use xmptools.cbd() instead and pass a newly created XMPMetadata instance as the target.

XMPMetadata.archive(filename)

Serialize and gzip-compress the metadata. The parameter filename should be the path of the target file, without the .gz extension; if the file has no extension, also .xmp is added (so that the resulting files are always named *.xmp.gz). The resulting file can be uncompressed using the normal gzip shell command.

XMPMetadata.getThumbnails(predicate=)

Return a list of thumbnail images (JPEGs, instances of PIL.Image.Image) if those are contained in the XMP metadata. Use the value of predicate to find the container of thumbnails from the metadata; predicate defaults to XMP.Thumbnails.

Functions

xmptools.makeFileURI(path)

Takes a file path and returns a corresponding file: URL, as string. You can pass this string to rdflib.URIRef() if you need an actual URI reference object.

xmptools.makeFilePath(uri, scheme=)

Take a URL and returns a corresponding file path. The URL must use the scheme specified (defaults to "file").

Using internal functionality

If you want to access some functionality in XMP Tools that is not “exported”, you can always do something like this:

from xmptools.xmptools import adjustNodes

Of course, we offer no guarantees about whether undocumented functionality will stay the same across version changes.