XMP Tools¶
Introduction¶
XMP Tools is a Python library that provides basic XMP support for RDFLib, including parsing, modification, and serialization. XMP is Adobe’s metadata format, based on RDF. Trivially, XMP metadata is RDF serialized as RDF/XML, “wrapped” within a special XML element.
Adobe’s XMP documentation can be found here.
The parser and the serializer are implemented as RDFLib plugins. Because of limited extensibility of RDFLib, we have copied some methods from RDFLib and modified them. The plugins register themselves as format="xmp"
. Normally, you do not have to know this, as we provide convenience functionality for reading and writing XMP (see below).
Installation¶
XMP Tools is available on PyPI as the package xmptools
. It can be installed simply via
pip install xmptools
It can also be installed directly from, say, PyCharm:
Note that
xmptools
now depends on the package rdfhelpers
.
Usage Examples¶
To read an XMP sidecar file:
from xmptools import XMPMetadata
xmp = XMPMetadata("file:///foo/bar/baz.xmp")
To write it back after modification:
xmp.write()
To write it back somewhere else:
xmp.write("file:///foo/bar/bazbaz.xmp")
If you want to serialize in some other format, use the mechanisms that RDFLib offers:
import sys
xmp.serialize(destination=sys.stdout.buffer, format="turtle")
A potential problem (from the RDF standpoint) with XMP sidecar files is that the included metadata statements are about the sidecar file, not the actual image file. If you want a graph where the metadata statements are about the image file, do this:
xmp_new = xmp.adjustImageURI()
To read the metadata from an existing image:
xmp, path = XMPMetadata.fromFile("./tests/images/testtiff.tif")
Note that you must first run the unit tests first to generate the file test.xmp
because the examples in this documentation use that as their input. More complete scenarios are available in docs/examples/
.
Add one new dc:subject
keyword to an XMP file:
from xmptools import XMPMetadata, makeFileURI, DC
def addDCSubject(xmp_file, new_subject):
# read in metadata
xmp = XMPMetadata(makeFileURI(xmp_file))
# fetch existing dc:subject keywords
keywords = xmp.getContainerItems(DC.subject)
if new_subject not in keywords:
# set the new list of keywords
xmp.setContainerItems(DC.subject, keywords + [new_subject])
# set the new modification date
xmp.setDate()
# write the metadata back to the file
xmp.write()
print("New XMP file written")
else:
print("Keyword already included, no file written")
addDCSubject("../tests/test.xmp", "Boeing 737")
Use a SPARQL query to transform XMP metadata into a simpler graph and serialize it:
Transform
dc:subject
anddc:creator
from containers to repeated properties.Construct a URI for the image file, based on the URI of the XMP sidecar file, and use it as the subject of all the statements.
from xmptools import XMPMetadata, makeFileURI
from rdflib import Graph
import sys
def queryAndTransform(xmp_file):
# read in metadata
xmp = XMPMetadata(makeFileURI(xmp_file))
# use a SPARQL CONSTRUCT query to build a new graph
results = xmp.query("""
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX Iptc4xmpCore: <http://iptc.org/std/Iptc4xmpCore/1.0/xmlns/>
PREFIX crs: <http://ns.adobe.com/camera-raw-settings/1.0/>
CONSTRUCT { ?file dc:subject ?subject ; dc:creator ?creator ; Iptc4xmpCore:Location ?loc }
WHERE {
# get the image file name
?xmp crs:RawFileName ?raw .
# extract dc:subject and dc:creator from containers and make them repeated properties
OPTIONAL { ?xmp dc:subject [ a rdf:Bag ; !a ?subject ] . }
OPTIONAL { ?xmp dc:creator [ a rdf:Seq ; !a ?creator ] . }
# extract location
OPTIONAL { ?xmp Iptc4xmpCore:Location ?loc . }
# find the filename extension of the original image file
BIND (STRAFTER(STR(?raw), ".") AS ?ext)
# construct a new URI using the image file extension
# note that we make the tacit assumption that the metadata file has extension "xmp" (length=3)
BIND (IRI(CONCAT(SUBSTR(STR(?xmp), 1, STRLEN(STR(?xmp))-3), ?ext)) AS ?file)
}""")
graph = Graph()
# insert query results into a new Graph instance
for statement in results:
graph.add(statement)
return graph
g = queryAndTransform("../tests/test.xmp")
g.serialize(destination=sys.stdout.buffer, format="turtle")
XMP Tools API¶
Namespaces¶
xmptools.XMP
An rdflib.Namespace
instance for the standard xmp:
namespace.
xmptools.EXIF
, xmptools.CRS
, xmptools.DC
, xmptools.DCT
, xmptools.PHOTOSHOP
These namespaces are provided as a convenience as they are often needed in XMP metadata manipulation. More predefined namespaces are available in RDFLib.
Classes¶
class xmptools.XMPMetadata(path=)
This is the main “entry point” to the functionality XMP Tools offers. It is a sublcass ofrdflib.Graph
and thus you can easily use it wherever you can use RDFLib graphs.
The parameter path
defaults to None
; if a different value is passed, we assume it points to an XMP sidecar file and we try to read and parse the contents. If path
is None
, an empty graph is created. If a path was passed, initializes self.url
with an URL (an rdflib.URIRef
instance) corresponding to path
. This URL is considered to be the resource the contained metadata statements are about; note that in case the metadata is read from a sidecar file, the statements are (perhaps a bit confusingly) about the sidecar file, not about the corresponding image file. This is just the way Adobe XMP works, it is not like they really understood the idea of RDF particularly well. The method adjustImageURI()
can be used to mitigate this problem.
@classmethod
XMPMetadata.fromFile(path, ignore_sidecar_if_pdf=)
The most “general” way of acquiring metadata, using the following logic:
Assume
path
points to an image/PDF file, create a corresponding XMP sidecar path and attempt to read its contents. Ignore this step is the file is a PDF file (as identified by its file extension) andignore_sidecar_if_pdf
isTrue
(the default).Failing #1, attempt to read metadata directly from the image/PDF file.
Returns an XMPMetadata
instance and the path from where the metadata was actually read from (either the file provided or its metadata sidecar), or (None, None)
if all attempts failed.
@classmethod
XMPMetadata.fromImageFile(path)
Reads metadata embedded in an image file. Supports JPEG, TIFF, and DNG formats. Returns an XMPMetadata
instance and path of the actual file from which the metadata was read. This could be either file provided or its sidecar file.
@classmethod
XMPMetadata.fromJPEG(path)
Reads metadata embedded in a JPEG file. Verifies that the file indeed is in JPEG format. Returns an XMPMetadata
instance.
XMPMetadata.read()
Reads and parses the metadata from the file that self.url
points to, assumes the XMP format.
XMPMetadata.write(destination=)
Writes the metadata into the file pointed to by destination
(a URL). If destination
is None
(the default), uses the value of self.url
instead.
XMPMetadata.getDate(predicate=)
Reads the specified predicate (which defaults to xmp:MetadataDate
) and returns it as a datetime
instance (even if the literal it finds was just “plain”). If no value is found, returns None
. Note that xmptools
uses its own ISO 8601 parser, thanks to the parser datetime.datetime.fromisoformat
failing for many genuine Adobe XMP timestamps.
XMPMetadata.setDate(timestamp=, predicate=)
Writes a timestamp into the specified predicate. Replaces any existing triple in the graph. The timestamp
parameter must be a datetime
instance or an ISO 8601 -formatted string, and defaults to the current time (i.e., the value of datetime.utcnow()
). This method can be used when metadata is modified and is written back to the sidecar file, but it is not called automatically. The parameter predicate
defaults to XMP.MetadataDate
.
XMPMetadata.findDateCreated(predicates=, error=)
Tries to find when the image was created by trying, in order, different time properties (parameter predicates
, defaults to a sequence of xmp:CreateDate
, exif:DateTimeOriginal
, photoshop:DateCreated
). Returns two values: the date (typically a datetime
instance, but possibly a date
instance) and the successful predicate used. If no date is found, raises an error. The parameter error
is used to provide the exception class (it defaults to DataNotFound
); if None
is passed for error
, returns None, None
.
@classmethod
XMPMetadata.fileModificationDate(path)
Given an image file path, find the latest modification date of the image: this could be either the modification date of the image file itself, or the modification date of the corresponding XMP sidecar file. If the path designates the sidecar file specifically, only that file is considered.
XMPMetadata.getContainerItems(predicate)
Reads the value of the specified predicate and, assuming its value is an RDF container, returns a list of the container’s values (as strings, further assuming that the values were all literals). This method is useful with properties such as dc:subject
or dc:creator
.
XMPMetadata.setContainerItems(predicate, values, newtype=)
Sets the values (the “items”) of an RDF container to literals that correspond to the strings in the list values
. If values
is None
or an empty list, removes the container and the linking predicate altogether. If no container exists prior, creates a new one (using newtype
as the container’s RDF type - it defaults to RDF.Seq
) and links it.
XMPMetadata.container2repeated(self, predicate, new_predicate=, value_mapper=, remove_predicate=, target_graph=)
Transforms items of a container (e.g., an rdf:Seq
) to repeated properties. The container is assumed to be the value of the property predicate
, and unless new_predicate
is specified, the same predicate is used for the transformed values. The parameter value_mapper
defaults to identity
(i.e., a no-op), but can be used to transform the container items to something else. If remove_predicate
is True
, the old predicate and its value are removed; it defaults to False
.
If remove_predicate
is not False
(which is the default), the method removes the original predicate. If target_graph
is specified (it defaults to self
), the new data is inserted there. The value of target_graph
is returned as the value of this method.
This method can be used to transform, say, dc:subject
to a more conveniently accessible predicate; value mapping allows the string values to be transformed to, say, SKOS concepts. Here is a simple example:
[] dc:subject [a rdf:Seq ;
rdf:_1 "Cat" ;
rdf:_2 "Dog"] .
would be transformed to
[] ex:hasTag "Cat" ;
ex:hasTag "Dog" ;
dc:subject [a rdf:Seq ;
rdf:_1 "Cat" ;
rdf:_2 "Dog"] .
by calling container2repeated(DC.subject, new_predicate=EX.hasTag)
. Note that you probably should not be using this if the order in the original container is significant.
repeated2container(self, predicate, new_predicate=, newtype=, value_mapper=, remove_predicate=, source_graph=)
This method effectively “reverses” the operation of container2repeated
. The value of newtype
(it defaults to RDF.Seq
) gives the type of the new container created. Data is read from the graph given as source_graph
, it defaults to self
.
XMPMetadata.adjustImageURI(new_extension=)
Modifies the graph so that its statements are about the image file, not about the sidecar file. The parameter new_extension
should be the filename extension of the image file; the image URI is constructed by taking the existing sidecar URI and substituting the new extension. If new_extension
is not specified, the extension of the filename specified in crs:RawFileName
(in the metadata) is used; if crs:RawFileName
is not present, an error is raised.
XMPMetadata.cbd(resource=, target=)
Computes the Concise Bounded Description of this XMP metadata, using xmptools.cbd()
(see below). Unless some additional data has been added to this XMP instance, the CBD should be an identical graph. The parameter resource
is the node for which the CBD is computed;it defaults to self.url
. The parameter target
is a graph into which the CBD is inserted; it defaults to a new instance of rdflib.Graph
.
If you want to produce an XMPMetadata
instance from a larger graph (say, a graph into which you have inserted XMP metadata from multiple images), use xmptools.cbd()
instead and pass a newly created XMPMetadata
instance as the target
.
XMPMetadata.archive(filename)
Serialize and gzip
-compress the metadata. The parameter filename
should be the path of the target file, without the .gz
extension; if the file has no extension, also .xmp
is added (so that the resulting files are always named *.xmp.gz
). The resulting file can be uncompressed using the normal gzip
shell command.
XMPMetadata.getThumbnails(predicate=)
Return a list of thumbnail images (JPEGs, instances of PIL.Image.Image
) if those are contained in the XMP metadata. Use the value of predicate
to find the container of thumbnails from the metadata; predicate
defaults to XMP.Thumbnails
.
Functions¶
xmptools.makeFileURI(path)
Takes a file path and returns a corresponding file:
URL, as string. You can pass this string to rdflib.URIRef()
if you need an actual URI reference object.
xmptools.makeFilePath(uri, scheme=)
Take a URL and returns a corresponding file path. The URL must use the scheme
specified (defaults to "file"
).
Using internal functionality¶
If you want to access some functionality in XMP Tools that is not “exported”, you can always do something like this:
from xmptools.xmptools import adjustNodes
Of course, we offer no guarantees about whether undocumented functionality will stay the same across version changes.