• No results found

[email protected] November 22, 2020

N/A
N/A
Protected

Academic year: 2022

Share "[email protected] November 22, 2020"

Copied!
116
0
0

Full text

(1)

The hyperxmp package

Scott Pakin

[email protected] November 22, 2020

Abstract

hyperxmpmakes it easy for an author to includexmpmetadata in apdf document produced by LATEX. hyperxmpintegrates seamlessly withhyperref and requires virtually no modifications to a document that already specifies document metadata throughhyperref’s mechanisms.

1 Introduction

Adobe Systems, Inc. has been promotingxmp[5]—eXtensible Metadata Platform—

as a standard way to include metadata within a document. The idea behindxmpis that it is anxml-based description of various document attributes and is embedded as uncompressed, unencoded text within the document it describes. By storing the metadata this way it is independent of the document’s file format. That is, regardless of whether a document is inpdf,jpeg,html, or any other format, it is trivial for a program (or human) to locate, extract, and—using any standardxml parser—process the embeddedxmpmetadata.

As of this writing there are few tools that actually do processxmp. However, it is easy to imagine future support existing in file browsers for displaying not only a document’s filename but also its title, list of authors, description, and other metadata.

This is too abstract! Give me an example. Consider a LATEX document with three authors—Jack Napier, Edward Nigma, and Harvey Dent—named in the LATEX source in the usual way: “\author{Jack Napier \and Edward Nigma

\and Harvey Dent}”. Withhyperxmp, the generatedpdffile will contain, among other information, the following stanza ofxmpcode embedded within it:

<dc:creator>

<rdf:Seq>

<rdf:li>Jack Napier</rdf:li>

<rdf:li>Edward Nigma</rdf:li>

This document corresponds tohyperxmpv5.9, dated 2020/11/22.

(2)

<rdf:li>Harvey Dent</rdf:li>

</rdf:Seq>

</dc:creator>

In the preceding code, thedcnamespace refers to the Dublin Core schema, a collection of metadata properties. Thedc:creator property surrounds the list of authors. Therdf namespace is the Resource Description Framework, which defines rdf:Seqas an ordered list of values. Each author is represented by an individual list item (rdf:li), making it easy for anxmlparser to separate the authors’ names.

Remember that xmpcode is stored as metadata. It does not appear when viewing or printing thepdffile. Rather, it is intended to make it easy for computer applications to identify and categorize the document.

1.1 Supported metadata

hyperxmpknows how to embed all of the following types of metadata within a document:

• address of primary author (Iptc4xmpCore:CreatorContactInfo.CiAdrExtadr, Iptc4xmpCore:CreatorContactInfo.CiAdrCity,

Iptc4xmpCore:CreatorContactInfo.CiAdrRegion, Iptc4xmpCore:CreatorContactInfo.CiAdrPcode, and Iptc4xmpCore:CreatorContactInfo.CiAdrCtry)

• author(s) (dc:creator)

• baseurlfor relative references (xmp:BaseURL)

• book edition (prism:bookEdition)

• copyright (dc:rightsandxmpRights:Marked)

• date (dc:date, xmp:CreateDate,xmp:ModifyDate, andxmp:MetadataDate)

• doi(prism:doi)

• email address(es) of primary author

(Iptc4xmpCore:CreatorContactInfo.CiEmailWork)

• file format (dc:format)

• file name of main LATEX source file (dc:source)

• file size in bytes (prism:byteCount)

• isbn(prism:isbn)

• issn—both print (prism:issn) and electronic (prism:eIssn)

• issue number of parent publication (prism:number)

(3)

• journal article version (jav:journal_article_version)

• keywords (pdf:Keywordsanddc:subject)

• language used (dc:language)

• licenseurl(xmpRights:WebStatement)

• metadata writer (photoshop:CaptionWriter)

• page count (prism:pageCount)

• page range(s) (prism:pageRange)

• pdfversion (pdf:PDFVersion)

• pdf-generating tool (pdf:Producerandxmp:CreatorTool)

• pdf/aversion and conformance level (pdfaid:partand pdfaid:conformance)

• pdf/uaversion (pdfuaid:part)

• pdf/xstandard compliance (pdfxid:GTS_PDFXVersion)

• position/title of primary author (photoshop:AuthorsPosition)

• publication name of parent publication (prism:publicationName)

• publisher of the document (dc:publisher)

• rendition variation of the document (xmpMM:RenditionClass)

• summary (dc:description)

• subtitle (prism:subtitle)

• telephone number(s) of primary author (Iptc4xmpCore:CreatorContactInfo.CiTelWork)

• title (dc:title)

• trapping of colors (pdf:trapped)

• type of document (dc:type)

• type of parent publication (prism:aggregationType)

• unique identifier for the document (dc:identifier)

• urlof the document (prism:url)

• url(s) of the primary author (Iptc4xmpCore:CreatorContactInfo.CiUrlWork)

• uuidfor the document (xmpMM:DocumentID)

(4)

\Title{Baking through the ages}

\Author{A. Baker\sepC. Kneader}

\Language{en−GB}

\Keywords{cookies\sepmuffins\sepcakes}

\Publisher{Baking International}

(a)pdfx(separate.xmpdatafile)

\hypersetup{%

pdftitle={Baking through the ages}, pdfauthor={A. Baker, C. Kneader}, pdflang={en−GB},

pdfkeywords={cookies, muffins, cakes}, pdfpublisher={Baking International}

}

(b)hyperxmp(main document) Figure 1: Comparison ofpdfx andhyperxmp

• uuidfor the document instance (xmpMM:InstanceID)

• version identifier for the document (xmpMM:VersionID)

• volume number of parent publication (prism:volume) More types of metadata may be added in a future release.

1.2 Comparisons with similar packages

xmpincl In short,xmpinclis more flexible buthyperxmp is easier to use. With xmpincl, the author manually constructs a file of arbitrary xmp data and the package merely embeds it within the generated pdf file. With hyperxmp, the author specifies values for various predefined metadata types and the package formats those values asxmpand embeds the result within the generatedpdffile.

xmpinclcan embedxmponly when running under pdfLATEX and only when in pdf-generating mode. hyperxmpadditionally works with a few otherpdf-producing LATEX backends.

hyperxmp andxmpincl can complement each other. An author may want to usehyperxmpto produce a basic set ofxmpcode, then extract thexmpcode from the pdffile with a text editor, augment the xmp code with any metadata not supported byhyperxmp, and usexmpinclto include the modifiedxmpcode in the pdffile.

pdfx The main difference between hyperxmp and pdfx is that hyperxmp tries to integrate as seamlessly as possible into an existing document. It leverages hyperref’s\hypersetupcommand and many of \hypersetup’s options and defines its own options in a compatible manner. In contrast,pdfx requires the user to create a separate\jobname.xmpdatafile containing pdfx-defined commands for each metadata element.

Figure 1 adapts an example appearing in thepdfx manual tohyperxmp. The two are comparable line-by-line in terms of how one specifies the title, author, document language, keywords, and publisher. However,hyperxmpimplicitly writes a wealth of additional metadata into thexmppacket such as the document date, creation date, creator tool, file format, pdfversion, and unique document and

(5)

instance IDs. In fact, if a document omits all of the code shown in Figure 1(b), it will still store the\titleand\authordata in thexmppacket.

One can therefore summarize the difference between hyperxmp andpdfx as follows: pdfxrequires the author to be fully explicit about the document’s metadata while hyperxmp allows some metadata to be specified implicitly, automatically inferring it when possible. In general,hyperxmptries to simplify the author’s task as much as possible.

2 Usage

hyperxmpworks by postprocessing some of the package options honored byhyperref. To usehyperxmp, merely put a\usepackage{hyperxmp}in your document’s pream- ble. That line can appear anywhere before thehyperrefpdfoptions are specified (i.e., with either\usepackage[. . .]{hyperref} or\hypersetup{. . .}). hyperxmp will construct itsxmpdata using the followinghyperref options:

• baseurl

• pdfauthor

• pdfcreationdate

• pdfkeywords

• pdflang

• pdfmoddate

• pdfproducer

• pdfsubject

• pdftitle

• pdftrapped

hyperxmp instructs hyperref also to accept the following options, which have meaning only tohyperxmp:

• pdfaconformance

• pdfapart

• pdfauthortitle

• pdfbookedition

• pdfbytes

• pdfcaptionwriter

• pdfcontactaddress

• pdfcontactcity

• pdfcontactcountry

• pdfcontactemail

• pdfcontactphone

• pdfcontactpostcode

• pdfcontactregion

• pdfcontacturl

• pdfcopyright

• pdfdate

• pdfdocumentid

• pdfdoi

• pdfeissn

• pdfidentifier

• pdfinstanceid

• pdfisbn

• pdfissn

• pdfissuenum

• pdflicenseurl

• pdfmetadate

• pdfmetalang

• pdfnumpages

• pdfpagerange

• pdfpublication

• pdfpublisher

• pdfpubstatus

• pdfpubtype

• pdfrendition

• pdfsource

• pdfsubtitle

(6)

pdftitle

pdfsubtitle

pdfauthor pdfauthortitle pdfcaptionwriter

pdfcontactaddress pdfcontactcity pdfcontactcountry pdfcontactemail pdfcontactphone pdfcontactpostcode pdfcontactregion pdfcontacturl pdfcopyright pdflicenseurl pdfmetalang

pdflang

pdfdocumentid

• pdftype

• pdfuapart

• pdfurl

• pdfversionid

• pdfvolumenum

• pdfxstandard

2.1 Option descriptions

The document title is specified as normal forhyperref withpdftitle, but see Note 7 on page 16 for instructions on how to specify a title in multiple languages. Ifpdftitle is not specified it will inherit its value from the document’s \title. hyperxmp introduces a complementarypdfsubtitleoption:

pdftitle={Frankenstein},

pdfsubtitle={The Modern Prometheus},

Unfortunately, the subtitle can appear in only one language. It assumed to be the same language as the document language (pdflang) but can be overridden by preceding the text with a bracketediso639-1 two-letter language code and an optionaliso3166-1 two-letter region code. See the example below forpdfpublication.

hyperref’spdfauthoroption specifies the document’s author(s). See Note 4 on page 15 for a discussion of the correct syntax. Ifpdfauthoris not specified it will inherit its value from the document’s\author. pdfauthortitleindicates the primary author’s position or title. pdfcaptionwriter specifies the name of the person who added the metadata to the document.

The next eight items describe how to contact the person or institution re- sponsible for the document (the “contact”). pdfcontactaddress is the contact’s street address and can include the institution name if the contact is an institu- tion; pdfcontactcityis the contact’s city; pdfcontactcountryis the contact’s coun- try;pdfcontactemailis the contact’s email address (or multiple, comma-separated email addresses);pdfcontactphone is the contact’s telephone number (or multiple, comma-separated telephone numbers);pdfcontactpostcode is the contact’s postal code;pdfcontactregionis the contact’s state or province; andpdfcontacturl is the contact’surl(or multiple, comma-separatedurls).

pdfcopyrightdefines the copyright text, andpdflicenseurlidentifies a urlthat points to the document’s license agreement.

pdfmetalang indicates the natural language in which certain metadata—

specifically, the document’s title, subject, and copyright statement—are written.

The language should be specified using an ietflanguage tag [11], for example,

“en” for English, “en-US” for specifically United States English, “de” for German, and so forth. If pdfmetalang is not specified, hyperxmp assumes the metadata language is the same as the document language (hyperref’s pdflang option). If neitherpdfmetalangnorpdflang is specified,hyperxmpuses only “x-default” as the metadata language.

xmp can include a universally unique identifier (uuid) for each document and for each instance of a given document. By default, hyperxmp assigns a version 4 (i.e., pseudorandom)uuid[12] for each of these. However, a document can

(7)

pdfinstanceid

pdfversionid

pdfisbn pdfissn pdfeissn pdfdoi pdfurl baseurl pdfidentifier

pdfpublication

alternatively specify a particular document identifier usingpdfdocumentidand (not normally recommended) a particular instance identifier usingpdfinstanceid. These should be of the formuuid:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx, where “x” is a lowercase hexadecimal number. For example,uuid:53ab7f19-a48c-5177-8bb2- 403ad907f632is a valid argument topdfdocumentid(orpdfinstanceid). See Leach, Mealling, and Salz’suuid specification document for details on how to produce the various forms ofuuids [12]. A more freeform mechanism thanpdfinstanceid for versioning documents is available viapdfversionid. The version specified by pdfversionid can be incremented as 1, 2, 3, . . . ; identified with a hierarchical numbering scheme (e.g., this document is versioned5.9 to match the package version); or labeled using any other approach. One possibility is to use a revision number or commit hash from the version-control software maintaining the document.

For example, the\gitVer macro from thegitver package is an expandable (see Note 8 on page 17) version of the current Git hash that can suitably be passed to pdfversionid. If not specified,pdfversioniddefaults to1.

Already-published documents can be identified in a number of ways. pdfisbn specifies the isbn. pdfissn refers to the issn of the print version of the doc- ument while pdfeissn refers to the issn of the electronic version of the docu- ment. pdfdoi specifies the doi and should include only the doi name without anyurlprefix. For example, specifypdfdoi={10.1145/3149526.3149532}, not pdfdoi={https://doi.org/10.1145/3149526.3149532}. pdfurl points to the complete url for the document. In contrast, baseurl points one level up and is used to resolve relativeurls.

pdfidentifierprovides an alternative mechanism to uniquely identify a document.

Its advantage relative topdfisbn,pdfissn,pdfdoi, etc. is its flexibility; any of a wide variety of identification types can be used.1 pdfidentifier’s disadvantage is that it allows only a single identifier per document. For example, a document could use pdfidentifier=urn:iso:std:32000:ed-1:v1:ento identify itself as version 1 of English-languageisostandard 32000-1, but then this same document could not also usepdfidentifierto identify itself bydoi(info:doi/. . . ),isbn(urn:ISSN:. . . ), etc. (It can still use the options described in the previous paragraph, though.) If pdfidentifieris not specified explicitly,hyperxmpwill use the first non-empty value out of thedoi, electronicissn, printissn, and isbnor skip the identifier entirely if all of those are empty.

Already-published documents can further be identified by the publication in which they appear. pdfpublicationspecifies the title of the journal, magazine, or other parent document. The title language is assumed to be the same as the document language (pdflang) but can be overridden by preceding the text with a bracketediso639-1 two-letter language code and an optionaliso3166-1 two-letter region code. For example, pdfpublication={[fr]Charlie Hedbo} indicates a French-language title. Were the language or pronunciation differences significant, fr-FRwould indicate specifically the French spoken in France, as opposed to that spoken in, say, Canada (fr-CA) or Belgium (fr-BE). The publisher itself can be

1See, for example, https://www.iana.org/assignments/urn-namespaces/urn-namespaces.

xhtmlfor theurn: urischeme andhttp://info-uri.info/registry/for theinfo:urischeme.

(8)

Table 1: Valid arguments forpdfpubstatus Value Meaning

AO Author’s Original

SMUR Submitted Manuscript Under Review AM Accepted Manuscript

P Proof

VoR Version of Record

CVoR Corrected Version of Record EVoR Enhanced Version of Record

pdfpublisher pdfpubtype

pdfvolumenum pdfissuenum pdfpagerange

pdfpubstatus

pdfbookedition

pdfdate

named usingpdfpublisher.

pdfpubtypeindicates the type of publication in which the document was pub- lished. This should be one of theprismaggregation types [9] such asbook,journal, magazine,manual,report, orwhitepaper.

For publications in journals, magazines, and similar periodicals, a document can specify the volume number withpdfvolumenumand the issue number within the volume withpdfissuenum. pdfpagerangeindicates the page numbers at which the document appears within the publication. The intention is that this be a comma-separated list of dash-separated ranges, as inpdfpagerange={1,4-5}. See Note 9 on page 17 for advice on how to assignpdfpagerangesemi-automatically. A journal article’s publication status can be indicated withpdfpubstatus. This option expects to take one of the values listed in Table 1. See theniso/alpsp Journal Article Versions recommendation [1] for an explanation of each of those values and when to use them.

For books, pdfbookedition names the edition of the book. This is specified as text, not a number. As withpdfpublication (above),pdfbookedition accepts a bracketed language code, as inpdfbookedition={[en]Second edition}.

xmpmetadata can include a number of dates (in fact, timestamps, as they include both date and time components). pdfdatespecifies the document date. It is analogous to the LATEX\datecommand, and, like\date, defaults to the date the document was built. It must be specified in either xmpformat [5] or pdf format [4]. xmpdates are written in the formyyyy-mm-ddThh:mm:ss+tt:tt.2 A W3C recommendation [15] discusses this format in more detail, but as an example, 14 hours, 15 minutes, 9 seconds past midnight U.S. Mountain Daylight Time (UTC-6) on the 23rd day of September in the year 2014 should be written as 2014-09-23T14:15:09-06:00. This can be truncated (with loss of information) to 2014-09-23T14:15:09, 2014-09-23T14:15, 2014-09-23, 2014-09, or 2014 but no other subsets. pdfdates are written in the formD:yyyymmddhhmmss+tt’tt’.

The same date in the preceding example would be written asD:20140923141509- 06’00’inpdfformat.

The document’s creation date, modification date, and metadata date are

2Although allowed by xmp, hyperxmp does not currently accept fractions of a second in timestamps.

(9)

pdfcreationdate pdfmoddate pdfmetadate

pdftype

pdfrendition

pdftrapped

pdfapart pdfaconformance

pdfuapart pdfxstandard

normally set automatically, butpdfcreationdate,pdfmoddate, andpdfmetadatecan be used to override the defaults. Likepdfdate, pdfmetadate can be specified in eitherxmpor pdfformat. However, becausehyperref definespdfcreationdateand pdfmoddateand expects these to be written aspdfdates,hyperxmpconcomitantly accepts these two dates only inpdf format as well. Note that it’s rare that a document would need to specify any ofpdfcreationdate,pdfmoddate, orpdfmetadate.

pdftype describes the type of document being produced. This refers to “the nature or genre of the resource” [5] such aspoem, novel or working paper, as opposed to the file format (alwaysapplication/pdfwhen generated byhyperxmp).

Althoughpdftypecan be assigned an arbitrary piece of text, thexmpspecification recommends selecting types from a “controlled vocabulary” such as thedcmiType Vocabulary [6]. ThedcmiType Vocabulary currently consists of onlyCollection, Dataset, Event, Image, InteractiveResource, MovingImage, PhysicalObject, Service, Software, Sound, StillImage, and Text. pdftype defaults to Text, which refers to “books, letters, dissertations, poems, newspapers, articles, archives of mailing lists,” [6] and other forms of text—all things LATEX is commonly used to typeset.

Sometimes a base document is rendered in different forms. pdfrenditionindicates the particular rendition the current document instance represents. The value should come from the following controlled vocabulary [5]: default,draft, low- res,proof, screen, andthumbnail. hyperxmp’s default value is default, which indicates the master document, unless thedraftoption is passed to\documentclass, in which casehyperxmpdefaults to draft.

hyperxmphonorshyperref’spdftrappedoption. A document can indicate whether it employs color trapping by specifying pdftrapped=True or pdftrapped=False.

(pdftrapped=Unknownis also allowed.)

pdfapart and pdfaconformance, are used in conjunction with hyperref’s pdfa option to claim a particularpdf/astandard by which the document abides. They default topdfapart=1andpdfaconformance=B, indicating thepdf/a-1b standard.

These can be changed (with caution) to assert that the document abides by a different standard (e.g.,pdf/a-2u). A document that conforms to thepdf/ua standard can usepdfuapartto indicate thepdf/uaconformance level. For example, pdfuapart=1asserts that the document respectspdf/ua-1. pdfxstandardindicates the particularpdf/xstandard by which the document abides. Unlikepdfapartand pdfaconformance, which accept a number and a letter, respectively,pdfxstandard expects a textual identification of a standard name. The following are the acceptable pdf/xstandard names as of at the time of this writing.

• pdf/x-1a:2001

• pdf/x-1a:2003

• pdf/x-3:2002

• pdf/x-3:2003

• pdf/x-4

• pdf/x-4p

• pdf/x-5g

• pdf/x-5n

• pdf/x-5pg

For example, one can specifypdfxstandard={PDF/X-4}orpdfxstandard={PDF/X- 3:2003}, but specifying pdfxstandard={PDF/X-3}will not passpdf/xvalidation.

Note that at the time of this writing the use of the pdf/x-4p, pdf/x-5n, and pdf/x-5pg standards has not been tested.

(10)

pdfsource

pdfnumpages pdfbytes

Rarely needed options

pdfsourceoverrides the name of the LATEX source file. It defaults to\jobname.tex but can be replaced by any other string. Ifpdfsourceis given an empty argument, no document source will be specified at all.

The number of pages in the published, print version of the document can be expressed withpdfnumpages. This is computed automatically when the document is built using either pdfLATEX or LuaLATEX.

Thepdfbytesoption expresses the document’s file size in bytes. The intention is for this to be used to display an estimate of download time to a user or to serve as a quick check on whether a file was transmitted correctly between systems.

pdfbytesis computed automatically by both pdfLATEX and LuaLATEX, using the file size from the previous build of the document.

It is usually more convenient to provide values for all of the options presented in this section usinghyperref’s \hypersetupcommand than on the\usepackage command line. See thehyperref manual for more information.

2.2 A complete example

The following is a sample LATEX document that provides values for most of the metadata options thathyperxmprecognizes:

\documentclass{article}

\usepackage[utf8]{inputenc}

\usepackage{hyperxmp}

\usepackage[unicode]{hyperref}

\title{%

On a heuristic viewpoint concerning the production and transformation of light}

\author{Albert Einstein}

\date{March 17, 1905}

\hypersetup{%

pdftitle={%

On a heuristic viewpoint concerning the production and transformation of light},

pdfsubtitle={[en-US]Putting that bum Maxwell in his place}, pdfauthor={Albert Einstein},

pdfauthortitle={\xmpquote{Technical Assistant\xmpcomma\ Level III}}, pdfdate={1905-03-17},

pdfcopyright={Copyright (C) 1905, Albert Einstein}, pdfsubject={photoelectric effect},

pdfkeywords={energy quanta, Hertz effect, quantum physics}, pdflicenseurl={http://creativecommons.org/licenses/by-nc-nd/3.0/}, pdfcaptionwriter={Scott Pakin},

pdfcontactaddress={Kramgasse 49}, pdfcontactcity={Bern},

(11)

pdfcontactpostcode={3011}, pdfcontactcountry={Switzerland}, pdfcontactphone={031 312 00 91}, pdfcontactemail={[email protected]}, pdfcontacturl={%

http://einstein.biz/,

https://www.facebook.com/AlbertEinstein },

pdfdocumentid={uuid:6d1ac9ec-4ff2-515a-954b-648eeb4853b0}, pdfversionid={2.998e8},

pdfpublication={[de]Annalen der Physik}, pdfpublisher={Wiley-VCH},

pdfpubtype={journal}, pdfvolumenum={322}, pdfissuenum={6}, pdfpagerange={132-148}, pdfissn={0003-3804}, pdfeissn={1521-3889}, pdfpubstatus={VoR}, pdflang={en}, pdfmetalang={en},

pdfurl={http://www.physik.uni-augsburg.de/annalen/history/einstein- papers/1905_17_132-148.pdf},

pdfdoi={10.1002/andp.19053220607}, pdfidentifier={info:lccn/50013519}

}

\XMPLangAlt{de}{pdftitle={Über einen die Erzeugung und Verwandlung des Lichtes betreffenden heuristischen Gesichtspunkt}}

\begin{document}

\maketitle

A profound formal difference exists between the theoretical concepts that physicists have formed about gases and other ponderable bodies, and Maxwell’s theory of electromagnetic processes in so-called empty space\dots

\end{document}

Compile the document topdfusing any of the following approaches:

• pdfLATEX

• LuaLATEX

• X E LATEX

• LATEX+Dvipdfm

• LATEX+Dvips+Adobe Acrobat Distiller

(12)

Unfortunately, the LATEX+Dvips+Ghostscript path doesn’t work. Ghostscript bug report #690066, closed with “wontfix” status on 2012-05-28, explains that Ghostscript doesn’t honor theMetadatatag needed to inject a customxmppacket.

Instead, Ghostscript fabricates anxmppacket of its own based on the metadata it finds in thepdffile’sInfodictionary (Author,Title,Subject, andKeywords).

Once the document is compiled, the resulting pdf file will contain an xmp packet that looks something like that shown in Appendix A. Figure 2 is a screenshot of thexmpmetadata as it appears in Adobe Acrobat’s “Advanced” metadata dialog box. Further clicking on the “Advanced” item within that dialog box displays all of the document’s metadata sorted by schema as shown in Figure 3.

Figure 2: xmpmetadata as it appears in Adobe Acrobat

2.3 Usage notes

(13)

Figure 3: Additional xmpmetadata as it appears in Adobe Acrobat

Note 1: Conflicting metadata in PDF/A documents A pdffile includes an Info dictionary containing Author, Title, Subject, and Keywords keys. The hyperref package’spdfauthor, pdftitle,pdfsubject, and pdfkeywordsoptions assign values to those keys. The hyperxmp package additionally uses those options to assign values to various xmp metadata: dc:creator, dc:title, dc:description, and pdf:Keywords. Thepdf/aspecification indicates that values that appear in both thepdfInfodictionary andxmppacket must match. The problem is that inxmp, the author and keywords can be proper lists, as in

(14)

keeppdfinfo

\xmplinesep

<dc:creator>

<rdf:Seq>

<rdf:li>Curly Howard</rdf:li>

<rdf:li>Larry Fine</rdf:li>

<rdf:li>Moe Howard</rdf:li>

</rdf:Seq>

</dc:creator>

while inpdf, the author and keywords are specified as flat strings. Alas, there is no definition of how a list should be collapsed to a flat string: “Curly Howard, Larry Fine, Moe Howard” or “Curly Howard; Larry Fine; Moe Howard” or something else. I have not yet found a form of flat string that passes allpdf/a validators. Furthermore, when Adobe Acrobat—at least Adobe Acrobat DC (2019) and earlier versions—converts apdffile topdf/aformat, it does so by discarding all but the first author, which is an unsatisfying solution.

Starting with version 4.0,hyperxmp’s solution is to suppress writing metadata to thepdfInfodictionary and write it only to thexmppacket. (hyperxmpv5.0+

is more sophisticated. It suppresses only the author and keyword lists.) This appears to pacifypdf/a validators yet retains the author and keyword lists in their non-truncated form. If desired, theInfodictionary can be retained by passing thekeeppdfinfooption to\hypersetup.

Note 2: Acrobat multiline-field bug The iptc Photo Metadata schema states that “the [contact] address is a multiline field” [10]. hyperxmp converts commas in pdfcontactaddress’s argument to line breaks in the generated xml.

Unfortunately, A bug in Adobe Acrobat—at least in Adobe Acrobat DC (2019) and earlier versions—causes thatpdfreader to discard line breaks in the contact address.

Interestingly, Adobe Illustrator CS5 correctly displays the contact address. If you find Adobe Acrobat’s behavior bothersome, you can redefine the \xmplinesep macro as a string to use as an address-line separator. For example, the following replaces all commas appearing inpdfcontactaddress’s argument with semicolons:

\renewcommand*{\xmlinesep}{;}

Note 3: Object compression One intention ofxmpis that metadata embedded in a file be readable even without knowledge of the file’s format. That is, the metadata are expected to appear as plain text. Althoughhyperxmpdoes its best to honor that intention, it faces a few challenges:

1. When run with versions of LuaLATEX earlier than 0.85,hyperxmpleaves allpdf objects uncompressed. This is due to LuaLATEX treating object compression as a global parameter, unlike pdfLATEX, which treats it as a local parameter.

Hence, whenhyperxmprequests that thexmppacket be left uncompressed, LuaLATEX in fact leaves all pdf streams uncompressed. Beginning with

(15)

\xmpcomma

\xmpquote

version 3.0,hyperxmpincludes a workaround that correctly leaves only the xmpmetadata uncompressed, but this workaround is implemented only for LuaLATEX v0.85 onwards.

2. X E LATEX (or, more precisely, thexdvipdfmxback end) exhibits the opposite problem. It compressesall pdfobjects, including the ones containingxmp metadata. While Adobe Acrobat can still detect and utilize thexmpmetadata, non-pdf-aware applications are unlikely to see the metadata. Three options to consider are to (1) use a different program (e.g., LuaLATEX), (2) pass the-- output-driver="xdvipdfmx -z0"option to X E LATEX to instructxdvipdfmx to turn off all compression (which will of course make thepdffile substantially larger), or (3) postprocess the generated pdf file by loading it into the commercial version of Adobe Acrobat and re-saving it with the Save As. . . menu option.

Note 4: Literal commas hyperxmpsplits the pdfauthorandpdfkeywords lists at commas. Therefore, when specifyingpdfauthor andpdfkeywords, you should separate items with commas. Also, omit “and” and other text that does not belong to any list item. The following examples should serve as clarification:

Wrong: pdfauthor={Jack Napier, Edward Nigma,andHarvey Dent}

Wrong: pdfauthor={Jack Napier;Edward Nigma;Harvey Dent}

Right: pdfauthor={Jack Napier, Edward Nigma, Harvey Dent}

If you need to include a literal comma within an author or keyword list (where commas normally separate list items) or a street address (where commas normally separate lines), use the\xmpcommamacro to represent it, and wrap the entire entry containing the comma within\xmpquote{. . .}as shown below:

pdfauthor={\xmpquote{Jack Napier\xmpcomma\ Jr.},

\xmpquote{Edward Nigma\xmpcomma\ PhD},

\xmpquote{Harvey Dent\xmpcomma\ Esq.}}

pdfcontactaddress={Office of the President,

\xmpquote{Wayne Enterprises\xmpcomma\ Inc.}, One Wayne Blvd}

As of version 2.2 ofhyperxmp, it is acceptable to use\xmpcommaand\xmpquote within anyhyperxmpoption, not just in those in which a comma normally serves as a separator (i.e., lists and multiline fields). Outside of cases in which a comma serves as a separator,\xmpcommais treated as an ordinary comma, and\xmpquote returns its argument unmodified. Hence, it is legitimate to use\xmpcommaand

\xmpquotein cases like the following

pdfauthortitle={\xmpquote{Psychiatrist\xmpcomma\ Arkham Asylum}}

(16)

\xmptilde

\XMPLangAlt

(Like mosthyperxmpoptions, pdfauthortitleinserts its argument unmodified in an xmptag.) When in doubt, use \xmpcommaand\xmpquote; it should always be safe to do so.

Version 2.4 of hyperxmp introduces a convenience macro called \xmptilde.

\xmptildeexpands to a literal tilde character instead of the nonbreaking space that “~” normally represents. Use it to representurls such ashttp://www.pakin.

org/~scott/(“http://www.pakin.org/\xmptilde scott/”) in options such as baseurl,pdfcontacturl andpdflicenseurl.

Note 5: Unicode support Unicode support is provided via thehyperref pack- age. If you specifyunicode=trueeither as a hyperrefoption or as an argument to the\hypersetupcommand, the document can include Unicode characters in its xmpfields.

Note 6: Automatically specified metadata hyperxmpattempts to identify certain metadata automatically. The hope is that in many cases, an author can simply include \usepackage{hyperxmp} in a document’s preamble and benefit from a modicum ofxmpmetadata with no additional effort.

Currently,pdftitledefaults to the document’s title as specified by\title{. . .}.

pdfauthor defaults to the document’s author(s) as specified by \author{. . .}.

pdfdatedefaults to the current date and time. pdfmetalang defaults to the same value as pdflang if non-empty, “x-default” otherwise. hyperxmp recognizes some class-specific metadata as well, such as that provided via the Koma letter classes (e.g.,scrlttr2) and theacmarticle class (acmart).

If a document uses either thebabelorpolyglossiapackages it is recommended that itnotexplicitly setpdflang. pdflangaccepts only a single language name while hyperxmpcan automatically querybabeland polyglossiafor a list of all languages used in the document and include this list in anxmpdc:language element.

Note 7: Multilingual metadata Thepdfmetalangoption specifies the language in which the document’s metadata is written. It defaults to the value ofpdflang, which specifies the document language. As of version 3.3 of hyperxmp, it is possible to include certain metadata—specifically, the document’s title, subject, and copyright statement—in more than one language. The\XMPLangAltmacro provides this functionality. Usage is as follows:

\XMPLangAlt {hlanguagei} {hoptioni=htexti, . . . }

where hlanguagei is an iso 639-1 two-letter country code with an optional iso 3166-1 two-letter region code (e.g., “en” for English or “en-US” for specifically US English);hoptioniis one of “pdftitle”, “pdfsubject”, or “pdfcopyright”; and htextiis the text as expressed in the specified language. By way, of example, the following code provides the document title in English then specifies an alternative title to use in four other languages:

\hypersetup{%

(17)

pdfmetalang={en}, pdftitle={English title}

}

\XMPLangAlt{de}{pdftitle={Deutscher Titel}}

\XMPLangAlt{fr}{pdftitle={Titre fran\c{c}ais}}

\XMPLangAlt{it}{pdftitle={Titolo italiano}}

\XMPLangAlt{rm}{pdftitle={Titel rumantsch}}

Note 8: Expandable arguments All arguments passed tohyperxmpoptions must be expandable, in TEX terminology. This implies that any macros that are used in arguments are limited to a relatively small set of operations (such as conditionals and macro expansion) and must produce a string of text. Code (such as macro definitions and arithmetic operations) will be written toxmpas code, not as the result of executing the code.

By way of example, the macros provided by thetexdatepackage for typesetting dates are not expandable (at least at the time of this writing). Hence, the

\printfdate{Y}in the following code snippet is not replaced by the current year, as one might expect:

\usepackage{texdate}

\initcurrdate

\hypersetup{%

pdfcopyright={Copyright \textcopyright\ \printfdate{Y}, Scott Pakin}

}

Rather, it generates adc:rightstag of the form “Copyright © =2=0=by-1by=02020, Scott Pakin”. The garbage in that line corresponds to the remnants of the \printfdate code after expanding all of the TEX primitives and cer- tain other control sequences it uses to the empty string. For example,

“\global\advance\texd@yr by-1” expands to “by-1”.

It is not possible to determine a priori whether or not a macro is expandable.

The best advice is to carefully inspect thexmppackage in the output file to ensure that any macros used in arguments tohyperxmpoptions produced the expected output.

Note 9: Semi-automatic page ranges Althoughpdfpagerangeis intended to refer to pages in the final, published version of a document, it would be convenient for them to be generated automatically when producing a standalonepdffile that is not intended to be incorporated into a book, journal, or other publication (or if it is known that the pages will not be renumbered for publication). One approach is to use thetotpagespackage help generate pdfpagerange. For documents numbered from 1 ton, a simple

(18)

\hypersetup{%

pdfpagerange={1-\ref*{TotPages}}

}

should suffice. A bit more effort is needed for documents that change numbering schemes, such as using lowercase Roman numerals for the front matter and Arabic numerals for the main matter and back matter. One approach is to use\labelto mark the first and last page of each numbering scheme and specifypdfpagerange as in the following:

\hypersetup{%

pdfpagerange={%

\pageref*{page:begin-front}-\pageref*{page:end-front},%

1-\pageref*{TotPages}%

} }

I don’t know how unnumbered pages (e.g., blank pages and the title page) are supposed to be handled. I suppose blank pages can be omitted frompdfpagerange, and the title page can be either omitted or listed astitle, for example.

It appears that at least with version 2.00 oftotpages, theTotPageslabel is not defined until after the\begin{document}. Consequently, usingTotPageswithin a\hypersetup invocation in the document’s preamble will produce “??” as the page count in thexmppacket. The solution is either to assign pdfpagerangeafter the\begin{document}or to ask LATEX to do that on your behalf:

\AtBeginDocument{%

\hypersetup{%

pdfpagerange={1-\ref*{TotPages}}

}%

}

Note 10: Automatic computation of the PDF byte count The prism Basic Metadata schema [8] defines aprism:byteCountproperty that indicates the pdf file size in bytes. hyperxmp computes this value automatically when the document is built using LuaLATEX but not when using any other TEX engine. Note thathyperxmpuses the file size from the previous run of LuaLATEX because the newpdffile is not yet complete. Consequently, one extra compilation is needed for the byte count to converge relative to the the number of compilations that would otherwise be required.

Starting withhyperxmpv5.9, thehyperxmpdistribution includes a Perl script calledhyperxmp-add-bytecountthat edits apdffile in place, adding or replacing theprism:byteCountproperty with one that specifies the final file size.3 Run the script as “hyperxmp-add-bytecounthfilename.pdfi”.

3The script was in fact introduced withhyperxmpv5.8 and was then calledadd_byteCount.

(19)

foreach my$cmd ( "latex", "lualatex", "pdflatex", "xelatex",

"dvipdf", "xdvipdfmx", "ps2pdf" ) {

${$cmd} = "internal mycmd ${$cmd}";

}

submycmd {

my$retval =system@_;

if( $$Pdest =~ /\.pdf$/ ) {

system'hyperxmp-add-bytecount', $$Pdest;

}

return$retval;

}

Figure 4: latexmk configuration-file code for automatically invoking hyperxmp-add-bytecountevery time apdffile is generated

The latexmkbuild tool can be configured to run hyperxmp-add-bytecount automatically every time apdffile is generated. Simply add the code shown in Figure 4 to yourlatexmkconfiguration file. See thelatexmkmanual for information on configuration-file naming on different operating systems and explanations of the hook functions used in Figure 4.

Even thoughhyperxmpcan compute the byte count automatically when run from LuaLATEX, users oflatexmkneed to use configuration-file code like that shown in Figure 4. Otherwise,latexmkwould compile the document one time too few for the byte count to converge. It is recommended that those who use bothlatexmk andhyperxmpconfigurelatexmkto behyperxmp-aware.

3 Implementation

This section presents the commented LATEX source code forhyperxmp. Read this section only if you want to learn howhyperxmpis implemented.

One thing to bear in mind when reading the hyperxmp source code is that different actions occur at different times throughout document processing:

1. \usepackage{hyperxmp}: hyperxmpparses package options, defines a num- ber of commands, loads various helper packages, and assigns default values to mostxmpfields.

2. \begin{document}: hyperxmploads certain packages such ashyperref and ifdraftand queries natural-language information frombabelandpolyglossia that becomes available only at the end of the preamble.

3. \end{document}: hyperxmpfinalizes certain data that are known only at the end of the document, such as the page count, and writes thexmppacket to thepdffile.

(20)

3.1 Initial preparation

\hyxmp@dq@code Thengermanpackage redefines “ " ” as an active character, which causes problems forhyperxmpwhen it tries to use that character. We therefore save the double-quote character’s current category code in\hyxmp@dq@codeand mark the character as category code 12 (“other”). The original category code is restored at the end of the package code (Section 3.8).

1\edef\hyxmp@dq@code{\the\catcode‘\"}

2\catcode‘\"=12

\hyxmp@at@end The \hyxmp@at@end macro includes code at the end of the document. When available (as is the case in most modern TEX backends),\AtEndDocumentworks well enough. Otherwise, we invoke\AtEndDvifrom theatenddvipackage, which is robust but requires an addition LATEX run.

3\@ifundefined{AtEndDocument}{%

4 \RequirePackage{atenddvi}

5 \let\hyxmp@at@end=\AtEndDvi

6}{%

7 \let\hyxmp@at@end=\AtEndDocument

8}

\hyxmp@set@jobname Given an expanded \jobname followed by \relax, invoke the

\hyxmp@set@jobname@dbl macro if the job name is surrounded by double quotes and the\hyxmp@set@jobname@plain macro otherwise.

9\def\hyxmp@set@jobname#1\relax{%

10 \@ifnextchar"{\hyxmp@set@jobname@dbl}{\hyxmp@set@jobname@plain}#1\relax

11}

\hyxmp@set@jobname@dbl

\hyxmp@jobname

Set\hyxmp@jobnameto to#1, discarding the surrounding double quotes.

12\def\hyxmp@set@jobname@dbl"#1"\relax{\xdef\hyxmp@jobname{#1}}

\hyxmp@set@jobname@plain

\hyxmp@jobname

Set\hyxmp@jobnameto to#1.

13\def\hyxmp@set@jobname@plain#1\relax{\xdef\hyxmp@jobname{#1}}

Define\hyxmp@jobnameas a sanitized version of \jobname. The problem with using\jobnamedirectly is that it surrounds the filename with double quotes if it contains a space character. For example, a source file namedmy-file.texresults in a \jobnameof “my-file”, but a source file named my file.tex results in a

\jobnameof “"my file"”. Trying to access"my file".log(as is done on page 51) will fail because the filename does not in fact contain literal double quotes.

14\expandafter\hyxmp@set@jobname\jobname\relax

\hyxmp@aep@toks In order forhyperxmpto be loaded safely during\AtEndPreamblewe need to ensure that we perform no\AtEndPreambleactions until all top-level macro definitions have been made. The most straightforward approach would be to move all of hyperxmp’s \AtEndPreamble stanzas to the end of the package. However, this degrades readability of the source code. For instance, an\AtEndPreamblestanza

(21)

related to integration with hyperref could no longer appear in the “Integration with hyperref” section (Section 3.2). Hence, we instead store in a token list,

\hyxmp@aep@toks, each \AtEndPreamblestanza as we encounter it. This token list is evaluated as one of the package’s final actions (Section 3.8).

15\newtoks{\hyxmp@aep@toks}

3.2 Integration with hyperref

An important design decision underlying hyperxmp is that the package should integrate seamlessly withhyperref. To that end,hyperxmp takesxmpmetadata from hyperref’s baseurl, pdfauthor, pdfkeywords, pdflang, pdfproducer, pdfsubject, pdftrapped, and pdftitle options. It also introduces a number of new options, which are listed on pages 5–6. For consistency withhyperref’s document-metadata naming conventions (which are in turn based on LATEX’s document-metadata naming conventions), we do not prefix metadata-related macro names with our package-specific \hyxmp@ prefix. That is, we use names like \@pdfcopyright instead of\hyxmp@pdfcopyright.

We load a bunch of helper packages: kvoptionsfor package-option processing, pdfescapeandstringencfor re-encoding Unicode strings,intcalcfor performing inte- ger calculations (division and modulo),iftexfor determining which TEX engine is be- ing used,ifmtargfor testing if a macro argument is empty or all spaces,etoolboxfor dynamically patching existing commands (specifically,hyperref’s\PDF@FinishDoc), andifthenfor convenient string comparisons.

16\RequirePackage{kvoptions}

17\RequirePackage{pdfescape}

18\RequirePackage{stringenc}

19\RequirePackage{intcalc}

20\RequirePackage{iftex}

21\RequirePackage{ifmtarg}

22\RequirePackage{etoolbox}

23\RequirePackage{ifthen}

There are a few places wherehyperxmpcan take advantage of LuaTEX features.

To simplify the use of LuaTEX we load theluacodepackage.

24\ifLuaTeX

25 \RequirePackage{luacode}

26\fi

\@ifmtargexp

\@ifnotmtargexp

\@ifmtarg and \@ifnotmtarg do not expand their first argument. Define

\@ifmtargexpand\@ifnotmtargexpas expanding versions of those macros.

27\def\@ifmtargexp#1{\expandafter\@ifmtarg\expandafter{#1}}

28\def\@ifnotmtargexp#1{\expandafter\@ifnotmtarg\expandafter{#1}}

\@if@def@and@nonempty This macro combines\@ifundefinedand\@ifmtargexp. If the macro named #1 is both defined and non-empty, evaluate#2. Otherwise, evaluate#3.

29\newcommand*{\@if@def@and@nonempty}[3]{%

(22)

30 \@ifundefined{#1}{#3}{%

31 \expandafter\@ifmtargexp\expandafter{\csname#1\endcsname}{#3}{#2}%

32 }%

33}

\hyxmp@pdfstringdef

\hyxmp@textunderscore

Becausehyperxmpuses underscores to represent hard spaces, we need “\_” to map initially to something other than an underscore, in particular theascii nak(^^U) character. To accomplish this, we wraphyperref’s\pdfstringdefmacro with our own version that temporarily does the proper substitution. Later in the execution, after underscores have been replaced with spaces, we replacenakcharacters with underscores.

34\newcommand{\hyxmp@pdfstringdef}[2]{%

35 \let\hyxmp@textunderscore=\textunderscore

36 \let\textunderscore=\hyxmp@uscore

37 \pdfstringdef{#1}{#2}%

38 \let\textunderscore=\hyxmp@textunderscore

39}

\@pdfdatetime Prepare to store the document’s date and (optionally) time. Whether specified by the author inxmpformat or pdfformat (see Section 3.4.2) we always store

\@pdfdatetimeas anxmp-format string.

40\def\@pdfdatetime{}

41\define@key{Hyp}{pdfdate}{%

42 \begingroup

43 \Hy@unicodefalse

\next Expandpdfdate’s argument and convert it toxmpformat.

44 \edef\next{%

45 \noexpand\hyxmp@pdfstringdef\noexpand\@pdfdatetime{%

46 \noexpand\hyxmp@as@xmp@date{#1}}%

47 }%

48 \next

49 \endgroup

50}

\@pdfmetadatetime Prepare to store the document’s metadata date and (optionally) time. Whether specified by the author inxmpformat orpdfformat (see Section 3.4.2) we always store\@pdfmetadatetime as anxmp-format string.

51\def\@pdfmetadatetime{}

52\define@key{Hyp}{pdfmetadate}{%

53 \begingroup

54 \Hy@unicodefalse

\next Expandpdfmetadate’s argument and convert it toxmpformat.

55 \edef\next{%

56 \noexpand\hyxmp@pdfstringdef\noexpand\@pdfmetadatetime{%

57 \noexpand\hyxmp@as@xmp@date{#1}}%

58 }%

(23)

59 \next

60 \endgroup

61}

\@pdfcopyright Prepare to store the document’s copyright statement.

62\def\@pdfcopyright{}

63\define@key{Hyp}{pdfcopyright}{\hyxmp@pdfstringdef\@pdfcopyright{#1}}

\@pdftype Prepare to store the document’s logical type, which defaults to “Text”.

64\def\@pdftype{Text}

65\define@key{Hyp}{pdftype}{\hyxmp@pdfstringdef\@pdftype{#1}}

\@pdflicenseurl Prepare to store theurlcontaining the document’s license agreement.

66\def\@pdflicenseurl{}

67\define@key{Hyp}{pdflicenseurl}{\hyxmp@pdfstringdef\@pdflicenseurl{#1}}

\@pdfauthortitle Prepare to store the author’s position/title (e.g., Staff Writer).

68\def\@pdfauthortitle{}

69\define@key{Hyp}{pdfauthortitle}{\hyxmp@pdfstringdef\@pdfauthortitle{#1}}

\@pdfcaptionwriter Prepare to store the name of the person who inserted thehyperxmpmetadata.

70\def\@pdfcaptionwriter{}

71\define@key{Hyp}{pdfcaptionwriter}{\hyxmp@pdfstringdef\@pdfcaptionwriter{#1}}

\@pdfmetalang Prepare to store the natural language of the document’s metadata, typically as an iso639-1 two-letter abbreviation.

72\def\@pdfmetalang{}

73\define@key{Hyp}{pdfmetalang}{\hyxmp@pdfstringdef\@pdfmetalang{#1}}

\hyxmp@no@bad@parts Complain about a badpdfapartorpdfuapartif given trailing non-digits after a part number.

74\def\hyxmp@no@bad@parts#1\relax{%

75 \@ifnotmtarg{#1}{%

76 \PackageWarning{hyperxmp}{pdfapart and pdfuapart must be numeric}%

77 }%

78}

\@pdfapart Prepare to store thepdf/a part ID, which defaults to “1” if pdfa is passed to hyperref.

79\def\@pdfapart{}

80\define@key{Hyp}{pdfapart}{%

81 \afterassignment\hyxmp@no@bad@parts\@tempcnta=0#1\relax

82 \hyxmp@pdfstringdef\@pdfapart{\the\@tempcnta}%

83}

\@pdfaconformance Prepare to store thepdf/aconformance ID, which defaults to “b” ifpdfais passed tohyperref and\@pdfapartis empty.

84\def\@pdfaconformance{}

(24)

85\define@key{Hyp}{pdfaconformance}{%

86 \uppercase{\hyxmp@pdfstringdef\@pdfaconformance{#1}}%

87}

\@pdfuapart Prepare to store thepdf/uapart ID.

88\def\@pdfuapart{}

89\define@key{Hyp}{pdfuapart}{%

90 \afterassignment\hyxmp@no@bad@parts\@tempcnta=0#1\relax

91 \hyxmp@pdfstringdef\@pdfuapart{\the\@tempcnta}%

92}

\hyxmp@set@pdfx@major Parse pdfxstandard as “PDF/X-hmajorihotheri”, setting \hyxmp@pdfx@major to hmajori.

93\newcommand*{\hyxmp@set@pdfx@major}[1]{\hyxmp@set@pdfx@major@i#1!}

\hyxmp@set@pdfx@major@i This is the first helper macro for\hyxmp@set@pdfx@major. It stores thepdf/x major version in\@tempcnta.

94\def\hyxmp@set@pdfx@major@i PDF/X-{%

95 \afterassignment\hyxmp@set@pdfx@major@ii

96 \@tempcnta=%

97}

\hyxmp@set@pdfx@major@ii

\hyxmp@pdfx@major

This is the second helper macro for\hyxmp@set@pdfx@major. It copies thepdf/x major version from\@tempcntato\@hyxmp@pdfx@major and discards the rest of thepdf/xstandard string.

98\def\hyxmp@set@pdfx@major@ii#1!{%

99 \edef\hyxmp@pdfx@major{\the\@tempcnta}%

100}

\hyxmp@check@std Compare a user-provided string to a fixed string. (Assumption: Both are names of pdf/xstandard versions.) If they match, undefine\next, which we assume was previously defined to issue an “unrecognized standard” warning message.

101\newcommand*\hyxmp@check@std[2]{%

102 \ifthenelse{\equal{#1}{#2}}%

103 {\global\let\next=\relax}%

104 {}%

105}%

\@pdfxstandard Prepare to store thepdf/xstandard.

106\def\@pdfxstandard{}

107\def\hyxmp@pdfx@major{}

108\define@key{Hyp}{pdfxstandard}{%

109 \hyxmp@pdfstringdef\@pdfxstandard{#1}%

\next Issue a warning message if thepdf/xstandard named by the user does not appear in a list of knownpdf/x standards. This is to caution the user thathyperxmp generates standard-specific xmpmetadata and it can only guess at the correct

(25)

format for new standard versions. (See the comments on page 74 above the definition of\hyxmp@pdfx@id@schema, for example.)

110 \gdef\next{%

111 \PackageWarning{hyperxmp}{Unrecognized PDF/X standard ‘#1’}%

112 }%

113 \hyxmp@check@std{#1}{PDF/X-1a:2001}%

114 \hyxmp@check@std{#1}{PDF/X-1a:2003}%

115 \hyxmp@check@std{#1}{PDF/X-3:2002}%

116 \hyxmp@check@std{#1}{PDF/X-3:2003}%

117 \hyxmp@check@std{#1}{PDF/X-4}%

118 \hyxmp@check@std{#1}{PDF/X-4p}%

119 \hyxmp@check@std{#1}{PDF/X-5g}%

120 \hyxmp@check@std{#1}{PDF/X-5n}%

121 \hyxmp@check@std{#1}{PDF/X-5pg}%

122 \next

\hyxmp@pdfx@major Parse the pdf/x major version number from pdfxstandard and assign it to

\hyxmp@pdfx@major.

123 \hyxmp@set@pdfx@major{#1}%

124}

\@pdfsource Prepare to store the document’s source, which defaults to the value of\jobname.

125\edef\@pdfsource{\[email protected]}

126\define@key{Hyp}{pdfsource}{\hyxmp@pdfstringdef\@pdfsource{#1}}

\hyxmp@DocumentID Prepare to store auuidthat represents the document.

127\def\hyxmp@DocumentID{}

128\define@key{Hyp}{pdfdocumentid}{\hyxmp@pdfstringdef\hyxmp@DocumentID{#1}}

\hyxmp@InstanceID Prepare to store auuidthat represents the current instance of the document.

129\def\hyxmp@InstanceID{}

130\define@key{Hyp}{pdfinstanceid}{\hyxmp@pdfstringdef\hyxmp@InstanceID{#1}}

\@pdfversionid Prepare to store a string that represents the current version of the document. It defaults to “1”.

131\def\@pdfversionid{1}

132\define@key{Hyp}{pdfversionid}{\hyxmp@pdfstringdef\@pdfversionid{#1}}

\ifdraft

\next

Use the ifdraft package to determine if this is a draft or final document. The challenge here is that we want to useifdraftif it’s already loaded, load it if not, and not break any incompatible, author-defined\ifdraftmacros that may occur either before or after the\usepackage{hyperxmp}. Our solution begins by defining a new group. Then, ififdraftis not yet loaded, we locally undefine\ifdraftand load the package. In this case, we later “unload” the package by setting\[email protected] to\relax.

133\begingroup

134 \@ifpackageloaded{ifdraft}{%

135 \let\next=\relax

(26)

136 }{%

137 \let\ifdraft=\relax

138 \RequirePackage{ifdraft}%

139 \def\next{%

140 \expandafter\global\expandafter\let\csname [email protected]\endcsname=\relax

141 }%

142 }%

\@pdfrendition Prepare to store a tag describing how this rendition of the document differs from the master. The default value isdefault, which indicates the master document, except in the case of\documentclass[draft], for which\@pdfrenditiondefaults todraft.

143 \ifdraft{%

144 \gdef\@pdfrendition{draft}%

145 }{%

146 \gdef\@pdfrendition{default}%

147 }

148 \next

149\endgroup

150\define@key{Hyp}{pdfrendition}{\hyxmp@pdfstringdef\@pdfrendition{#1}}

\@pdfpublication Prepare to store the name of the publication in which the document was published.

151\def\@pdfpublication{}

152\define@key{Hyp}{pdfpublication}{\hyxmp@pdfstringdef\@pdfpublication{#1}}

\@pdfpubtype Prepare to store the type of the publication in which the document was published.

153\def\@pdfpubtype{}

154\define@key{Hyp}{pdfpubtype}{\hyxmp@pdfstringdef\@pdfpubtype{#1}}

\@pdfbytes Prepare to store the size of the file in bytes.

155\def\@pdfbytes{}

156\define@key{Hyp}{pdfbytes}{\hyxmp@pdfstringdef\@pdfbytes{#1}}

\@pdfnumpages Prepare to store the number of pages in the file.

157\def\@pdfnumpages{}

158\define@key{Hyp}{pdfnumpages}{\hyxmp@pdfstringdef\@pdfnumpages{#1}}

\@pdfissn Prepare to store theissn of the publication in which the document was published.

159\def\@pdfissn{}

160\define@key{Hyp}{pdfissn}{\hyxmp@pdfstringdef\@pdfissn{#1}}

\@pdfeissn Prepare to store theissnof the electronic version of the publication in which the document was published.

161\def\@pdfeissn{}

162\define@key{Hyp}{pdfeissn}{\hyxmp@pdfstringdef\@pdfeissn{#1}}

\@pdfisbn Prepare to store theisbnof the publication in which the document was published.

163\def\@pdfisbn{}

164\define@key{Hyp}{pdfisbn}{\hyxmp@pdfstringdef\@pdfisbn{#1}}

(27)

\@pdfbookedition Prepare to store the edition of the book in which the document was published.

165\def\@pdfbookedition{}

166\define@key{Hyp}{pdfbookedition}{\hyxmp@pdfstringdef\@pdfbookedition{#1}}

\@pdfpublisher Prepare to store the name of the document’s publisher.

167\def\@pdfpublisher{}

168\define@key{Hyp}{pdfpublisher}{\hyxmp@pdfstringdef\@pdfpublisher{#1}}

\@pdfvolumenum Prepare to store the volume identifier of the publication in which the document was published.

169\def\@pdfvolumenum{}

170\define@key{Hyp}{pdfvolumenum}{\hyxmp@pdfstringdef\@pdfvolumenum{#1}}

\@pdfissuenum Prepare to store the identifier of the issue within a volume of the publication in which the document was published.

171\def\@pdfissuenum{}

172\define@key{Hyp}{pdfissuenum}{\hyxmp@pdfstringdef\@pdfissuenum{#1}}

\@pdfpagerange Prepare to store the document’s range of pages within the publication in which the document was published.

173\def\@pdfpagerange{}

174\define@key{Hyp}{pdfpagerange}{\hyxmp@pdfstringdef\@pdfpagerange{#1}}

\@pdfdoi Prepare to store adoithat represents the current instance of the document.

175\def\@pdfdoi{}

176\define@key{Hyp}{pdfdoi}{\hyxmp@pdfstringdef\@pdfdoi{#1}}

\@pdfurl Prepare to store aurlthat represents where the document can be found. Note that we do not prependbaseurlto the value provided.

177\def\@pdfurl{}

178\define@key{Hyp}{pdfurl}{\hyxmp@pdfstringdef\@pdfurl{#1}}

\@pdfidentifier Prepare to store an identifier that uniquely represents the document.

179\def\@pdfidentifier{}

180\define@key{Hyp}{pdfidentifier}{\hyxmp@pdfstringdef\@pdfidentifier{#1}}

\@pdfsubtitle Prepare to store the document’s subtitle.

181\def\@pdfsubtitle{}

182\define@key{Hyp}{pdfsubtitle}{\hyxmp@pdfstringdef\@pdfsubtitle{#1}}

\@pdfpubstatus Prepare to store the document’s journal article version.

183\def\@pdfpubstatus{}

184\define@key{Hyp}{pdfpubstatus}{\hyxmp@pdfstringdef\@pdfpubstatus{#1}}

The following eight macros—\@pdfcontactaddress, \@pdfcontactcity,

\@pdfcontactregion, \@pdfcontactpostcode, \@pdfcontactcountry,

\@pdfcontactphone, \@pdfcontactemail, and \@pdfcontacturl—together specify how to contact the person or institution responsible for the document.

(28)

\@pdfcontactaddress Prepare to store a street address for the document’s contact person/institution.

Theiptcstandard defines this as follows:

The contact information address part. Comprises an optional company name and all required information to locate the building or postbox to which mail should be sent. To that end, the address is a multiline field.

For consistency with the rest ofhyperxmp, we use commas to separate terms, in this case, lines of the address. The author can use\xmpquoteand\xmpcomma to include literal commas.

185\def\@pdfcontactaddress{}

186\define@key{Hyp}{pdfcontactaddress}{%

187 \let\xmpcomma=\hyxmp@comma

188 \def\xmpquote##1{##1}%

189 \hyxmp@pdfstringdef\@pdfcontactaddress{#1}%

190 \def\xmpcomma{,}%

191 \let\xmpquote=\relax

192}

\@pdfcontactcity Prepare to store the city of the document’s contact person/institution.

193\def\@pdfcontactcity{}

194\define@key{Hyp}{pdfcontactcity}{\hyxmp@pdfstringdef\@pdfcontactcity{#1}}

\@pdfcontactregion Prepare to store the state or province of the document’s contact person/institution.

195\def\@pdfcontactregion{}

196\define@key{Hyp}{pdfcontactregion}{\hyxmp@pdfstringdef\@pdfcontactregion{#1}}

\@pdfcontactpostcode Prepare to store the postal code of the document’s contact person/institution.

197\def\@pdfcontactpostcode{}

198\define@key{Hyp}{pdfcontactpostcode}{\hyxmp@pdfstringdef\@pdfcontactpostcode{#1}}

\@pdfcontactcountry Prepare to store the country of the document’s contact person/institution.

199\def\@pdfcontactcountry{}

200\define@key{Hyp}{pdfcontactcountry}{\hyxmp@pdfstringdef\@pdfcontactcountry{#1}}

\@pdfcontactphone Prepare to store the telephone number of the document’s contact person/institution.

201\def\@pdfcontactphone{}

202\define@key{Hyp}{pdfcontactphone}{\hyxmp@pdfstringdef\@pdfcontactphone{#1}}

\@pdfcontactemail Prepare to store the email address of the document’s contact person/institution.

203\def\@pdfcontactemail{}

204\define@key{Hyp}{pdfcontactemail}{\hyxmp@pdfstringdef\@pdfcontactemail{#1}}

\@pdfcontacturl Prepare to store theurlof the document’s contact person/institution.

205\def\@pdfcontacturl{}

206\define@key{Hyp}{pdfcontacturl}{\hyxmp@pdfstringdef\@pdfcontacturl{#1}}

(29)

\hyxmp@no@info@lists Suppresshyperref from writingAuthorandKeywordsinto theInfodictionary. This prevents conflicts between thepdfmetadata and thexmpmetadata that cause pdf/a validation to fail. The pdf metadata can be restored by passing the keeppdfinfooption to\hypersetup.

207\def\hyxmp@no@info@lists{%

\hyxmp@suppress@pdf@info

\next

If \patchcmd fails for any reason—most likely, a modification to the hyperref package—our fallback is to preventhyperref from writinganydata to the pdfInfo dictionary.

208 \def\hyxmp@suppress@pdf@info{%

209 \global\let\PDF@FinishDoc=\@empty

210 \PackageWarningNoLine{hyperxmp}{%

211 Suppressing the _entire_ PDF Info dictionary.\MessageBreak

212 Please notify the hyperxmp maintainer%

213 }%

214 }%

215 \let\next=\relax

216 \patchcmd

217 {\PDF@FinishDoc}%

218 {/Author(\@pdfauthor)}%

219 {}%

220 {}%

221 {\let\next=\hyxmp@suppress@pdf@info}%

222 \patchcmd

223 {\PDF@FinishDoc}%

224 {/Keywords(\@pdfkeywords)}%

225 {}%

226 {}%

227 {\let\next=\hyxmp@suppress@pdf@info}%

228 \next

229}

230\define@key{Hyp}{keeppdfinfo}[true]{%

231 \gdef\hyxmp@no@info@lists{}%

232}

We need to capture list arguments (viz. pdfauthor and pdfkeywords) before hyperrefconverts them to PDFDocEncoding. Otherwise,\xmpcommais permanently replaced with a comma, and we lose our ability to change it to a\hyxmp@comma.

We therefore need to augmenthyperref’s option processing with our own. Because hyperref has not yet been loaded we need to ensure that our augmentation gets loaded in the future: after the\usepackage{hyperref} but before options are passed to that package.

For lack of a better approach, hyperxmp redefines \ProcessKeyvalOptions to alter the wayhyperref processespdfauthorand pdfkeywords. This is somewhat heavy-handed as it gets executed for every subsequently loaded package that uses\ProcessKeyvalOptions, but at least it does what we need. hyperxmpalso redefines\hypersetup to do the same thing. This is required in casehyperref is loaded beforehyperxmp.

(30)

\hyxmp@pdfauthor

\hyxmp@pdfkeywords

Prepare to store the name of the author and a list of keywords.

233\def\hyxmp@pdfauthor{}

234\def\hyxmp@pdfkeywords{}

\hyxmp@redefine@Hyp If not already redefined, redefinehyperref’spdfauthorandpdfkeywords options to properly handle\xmpcommaand\xmpquote.

235\newcommand*{\hyxmp@redefine@Hyp}{%

\hyxmp@Hyp@pdfauthor Store the old definition of \KV@Hyp@pdfauthor in \hyxmp@Hyp@pdfauthor, but only if we see that \KV@Hyp@pdfauthor is defined and \hyxmp@Hyp@pdfauthor isn’t. Otherwise, we’d be defining\hyxmp@Hyp@pdfauthorin terms of itself and creating an infinite loop.

236 \@ifundefined{KV@Hyp@pdfauthor}{}{%

237 \@ifundefined{hyxmp@Hyp@pdfauthor}{%

238 \expandafter\let\expandafter\hyxmp@Hyp@pdfauthor

239 \csname KV@Hyp@pdfauthor\endcsname

240 }{}%

241 }%

\KV@Hyp@pdfauthor

\xmpcomma

\xmpquote

\hyxmp@and

\and

\hyxmp@pdfauthor

\@pdfauthor

Redefine \KV@Hyp@pdfauthor to process its argument twice. The first time,

\xmpcommais defined as a placeholder character (\hyxmp@comma) and\xmpquote as the identity function. The result is stored in \hyxmp@pdfauthor for use in structured lists (those surrounding each entry with<rdf:li>). The second time,

\xmpcommais defined as an ordinary comma, and\xmpquoteis defined as a macro that puts its argument within double quotes. The result is stored in\@pdfauthor for use in unstructured lists (those in which the entire list appears within a single pair of tags). In casepdfauthoris left unspecified and we copy\author’s argument topdfauthor, we temporarily redefine \andas the list separator when producing a structured list and as “and” when producing an unstructured list.

242 \define@key{Hyp}{pdfauthor}{%

243 \let\xmpcomma=\hyxmp@comma

244 \def\xmpquote####1{####1}%

245 \let\hyxmp@and=\and

246 \def\and{,}%

247 \hyxmp@Hyp@pdfauthor{##1}%

248 \global\let\hyxmp@pdfauthor=\@pdfauthor

249 \def\and{and\space}%

250 \def\xmpcomma{,}%

251 \def\xmpquote####1{"####1"}%

252 \hyxmp@Hyp@pdfauthor{##1}%

253 \def\xmpcomma{,}%

254 \let\xmpquote=\relax

255 \let\and=\hyxmp@and

256 }%

\hyxmp@Hyp@pdfkeywords The previous block of code now repeats for the keyword list, starting by storing the old definition of\KV@Hyp@pdfkeywordsin\hyxmp@Hyp@pdfkeywords.

References

Related documents

Figure 5 shows one such tool, known as the Gatherer, being used to collate a selection of images for a digital library collection, augment these source documents with textual

In Section 2.1 we review Solovay’s construction of a Solovay function before we discuss in Section 2.2 the characterization of weak Solovay functions by property (i), i.e., by the

It is integral to reassessments, ozone layer protection, managing the environmental effects of activities in the ocean, considering new organisms and hazardous

When tested in accordance with 9.4 of AS 2734 - Asphalt (Hot-mixed) Paving – Guide to Good Practice, the Characteristic Percent Marshall Density (Compaction) for any test

Given a filename, some arbitrary text, and an optional set of attachment options, embed the corresponding file into the generated pdf file, and use the text as the icon. We recycle

Answer: The actual number of all contracts awarded to date for 2003-04 is 1315 and a copy of the details of these is attached.. C onstructions and

To date, when repositories of personal papers have been offered records in electronic form, they have usually followed the strategy of transferring the records into a format

(a) Previous Record Groups: showing date of function change and the VRG number, Group title and date range of the previous Group;. (b) Subsequent Record Group: showing the date

The configuration file includes a single index, based on document text, and one classifier, an AZList based on Title metadata, shown in Figure 4 (the alphabetic selector is

Environmental Protection Act 1986 Page 2 of 7 Decision document: L8669/2012/2 Amendment date: Friday, 11 December 2015. File Number: DER2014/000904

Sessional Com m ittee on the Environm ent 79.. A strong research and development effort, particularly into the integration of control methods, is essential to the

This lightweight LaTeX2e package provides an environment filecontentsdef which is like the filecontents environment of Scott Pakin ’s filecontents package but in addition to the

\filemodcmp[ 〈 num 〉 ]{ 〈 filename 1 〉 }{ 〈 filename 2 〉 }{ 〈 clause 1 〉 }{ 〈 clause 2 〉 }{ 〈 clause 3 〉 } This macro compares the file modification date and time

The singlefile option is a work-around: it writes all of your digraphs to a single file (tmpmaster.graphviz), and then uses gvpr to split that file into individual dot files

german support German date format naustrian support new Austrian date format ngerman support new German date format italian support Italian date format norsk support Norwegian

pdf It’s better not to use pages option, but let the class calculate the size of your book from the PDF file (using qpdf command line tool, which has to be

If the date on which a report, record or other information is required falls after the date the licensee requests to surrender this licence, the licensee must provide the

(3) The Committee shall examine only those accounts of receipts and expenditure of the Northern Territory and reports of the Auditor-General for financial years commencing after

Ms LAWRIE (Leader of Government Business): Madam Speaker, I move – That, the Assembly refer the following matters to the Standing Orders Committee for inquiry and report to

The City makes every attempt to keep its published records up to date; however the subject document may have been superseded. by a more recently

The City makes every attempt to keep its published records up to date; however the subject document may have been superseded. by a more recently

The City makes every attempt to keep its published records up to date; however the subject document may have been superseded. by a more recently

“Resilience is built through the everyday, every minute habits and exercises that punctuate our daily