Skip to content

Metadata & XMP

PDF Oxide reads document-level metadata from multiple sources: the PDF header (version), the trailer and catalog dictionaries, XMP metadata streams (ISO 16684), and page label definitions. The XmpExtractor parses the Dublin Core, XMP Core, PDF, and XMP Rights namespaces, plus any custom properties.

Use version() and catalog() for basic document properties, XmpExtractor::extract() for rich metadata, and PageLabelExtractor for page numbering schemes.

Quick Example

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
major, minor = doc.version()
print(f"PDF {major}.{minor}, {doc.page_count()} pages")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("report.pdf");
const { major, minor } = doc.getVersion();
console.log(`PDF ${major}.${minor}, ${doc.pageCount()} pages`);
doc.close();

Go

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
major, minor, _ := doc.Version()
pages, _ := doc.PageCount()
fmt.Printf("PDF %d.%d, %d pages\n", major, minor, pages)

C#

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
var (major, minor) = doc.Version;
Console.WriteLine($"PDF {major}.{minor}, {doc.PageCount} pages");

WASM

const doc = new WasmPdfDocument(bytes);
const version = doc.version();
console.log(`PDF ${version}, ${doc.pageCount()} pages`);

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("report.pdf")?;
let (major, minor) = doc.version();
println!("PDF {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);

API Reference

version() -> (u8, u8)

Get the PDF version from the file header.

Returns: A tuple of (major, minor), e.g., (1, 7) for PDF 1.7 or (2, 0) for PDF 2.0.


catalog() -> Result<Object>

Get the document catalog dictionary. The catalog is the root of the PDF object hierarchy and contains references to the page tree, outlines, names, and other document-level structures.

Rust

let mut doc = PdfDocument::open("report.pdf")?;
let catalog = doc.catalog()?;
if let Some(dict) = catalog.as_dict() {
    for (key, _) in dict {
        println!("Catalog key: {}", key);
    }
}

trailer() -> &Object

Get the document trailer dictionary. The trailer contains the cross-reference table location, document ID, encryption dictionary reference, and info dictionary reference.

Rust

let doc = PdfDocument::open("report.pdf")?;
let trailer = doc.trailer();
println!("Trailer: {:?}", trailer);

XmpExtractor::extract(doc) -> Result<Option<XmpMetadata>>

Extract XMP (Extensible Metadata Platform) metadata from the document’s metadata stream. XMP provides richer metadata than the traditional Info dictionary, using standard XML namespaces.

Parameter Type Description
doc &mut PdfDocument The PDF document

Returns: Some(XmpMetadata) if XMP data is present, None otherwise.

XmpMetadata Fields

Dublin Core namespace (dc:)

Field Type Description
dc_title Option<String> Document title
dc_creator Vec<String> Authors/creators list
dc_description Option<String> Document description
dc_subject Vec<String> Subject keywords
dc_language Option<String> Document language (e.g., "en-US")
dc_rights Option<String> Copyright statement
dc_format Option<String> MIME format (e.g., "application/pdf")

XMP Core namespace (xmp:)

Field Type Description
xmp_creator_tool Option<String> Tool used to create the document
xmp_create_date Option<String> Creation date (ISO 8601)
xmp_modify_date Option<String> Last modification date
xmp_metadata_date Option<String> Metadata modification date

PDF namespace (pdf:)

Field Type Description
pdf_producer Option<String> PDF producer application
pdf_keywords Option<String> Keywords string
pdf_version Option<String> PDF version from XMP (may differ from header)
pdf_trapped Option<String> Trapping status

XMP Rights namespace (xmpRights:)

Field Type Description
xmp_rights_usage_terms Option<String> Usage terms
xmp_rights_marked Option<bool> Whether marked with rights
xmp_rights_web_statement Option<String> Web statement URL

Other

Field Type Description
custom HashMap<String, String> Custom properties (namespace:property to value)
raw_xml Option<String> The original XMP XML packet

Rust

use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    if let Some(title) = &xmp.dc_title {
        println!("Title: {}", title);
    }
    for creator in &xmp.dc_creator {
        println!("Author: {}", creator);
    }
    if let Some(tool) = &xmp.xmp_creator_tool {
        println!("Created with: {}", tool);
    }
    if let Some(date) = &xmp.xmp_create_date {
        println!("Created: {}", date);
    }
    if let Some(producer) = &xmp.pdf_producer {
        println!("Producer: {}", producer);
    }
}

WASM

const doc = new WasmPdfDocument(bytes);
const xmp = doc.xmpMetadata();

if (xmp) {
  console.log(`Title: ${xmp.dc_title}`);
  console.log(`Authors: ${xmp.dc_creator}`);
  console.log(`Created with: ${xmp.xmp_creator_tool}`);
  console.log(`Created: ${xmp.xmp_create_date}`);
  console.log(`Producer: ${xmp.pdf_producer}`);
}
doc.free();

Python

doc = PdfDocument("report.pdf")
xmp = doc.xmp_metadata()

if xmp:
    print(f"Title: {xmp.get('dc_title')}")
    print(f"Authors: {xmp.get('dc_creator')}")
    print(f"Created with: {xmp.get('xmp_creator_tool')}")
    print(f"Created: {xmp.get('xmp_create_date')}")
    print(f"Producer: {xmp.get('pdf_producer')}")

<!-- Node.js: no equivalent on PdfDocumentImpl — xmp metadata not exposed in js/src/index.ts -->

Go

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
xmp, _ := doc.XmpMetadata() // returns JSON string
fmt.Println(xmp)

C#

using var doc = PdfDocument.Open("report.pdf");
var xmp = doc.GetXmpMetadata(); // returns JSON string
Console.WriteLine(xmp);

Pdf Convenience Methods

The high-level Pdf API provides shortcut methods for common metadata queries.

xmp_metadata() -> Result<Option<XmpMetadata>>

Get the full XMP metadata object.

xmp_title() -> Result<Option<String>>

Get just the document title from XMP.

xmp_creators() -> Result<Vec<String>>

Get the list of creators/authors from XMP.

Rust

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("report.pdf")?;

if let Some(title) = pdf.xmp_title()? {
    println!("Title: {}", title);
}

let creators = pdf.xmp_creators()?;
for creator in &creators {
    println!("Author: {}", creator);
}

PageLabelExtractor::extract(doc) -> Result<Vec<PageLabelRange>>

Extract page label definitions from the document. Page labels define how page numbers are displayed (e.g., Roman numerals for front matter, Arabic numerals for body).

Parameter Type Description
doc &mut PdfDocument The PDF document

Returns: A vector of PageLabelRange definitions.

PageLabelRange Fields

Field Type Description
start_page usize First page index this range applies to
style PageLabelStyle Numbering style
prefix Option<String> Label prefix string
start_number u32 Starting number for this range

PageLabelStyle Variants

Variant Description Example
DecimalArabic Arabic numerals 1, 2, 3
UppercaseRoman Uppercase Roman I, II, III
LowercaseRoman Lowercase Roman i, ii, iii
UppercaseLetters Uppercase letters A, B, C
LowercaseLetters Lowercase letters a, b, c
None No numbering (prefix only)

Pdf Page Label Convenience Methods

page_labels() -> Result<Vec<PageLabelRange>>

Get all page label range definitions.

page_label(page) -> Result<String>

Get the display label for a specific page index.

Rust

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("book.pdf")?;

// Get all label ranges
let ranges = pdf.page_labels()?;
for range in &ranges {
    println!(
        "Pages from {}: {:?} style, prefix={:?}, start={}",
        range.start_page, range.style, range.prefix, range.start_number
    );
}

// Get label for a specific page
let label = pdf.page_label(0)?;
println!("Page 0 label: {}", label); // e.g., "i" or "Cover"

WASM

const doc = new WasmPdfDocument(bytes);
const labels = doc.pageLabels();

for (const range of labels) {
  console.log(`Pages from ${range.start_page}: style=${range.style}, prefix=${range.prefix}`);
}
doc.free();

Python

doc = PdfDocument("book.pdf")
labels = doc.page_labels()

for range in labels:
    print(f"Pages from {range['start_page']}: style={range['style']}, prefix={range['prefix']}")

<!-- Node.js: no equivalent on PdfDocumentImpl — pageLabels not exposed on class, only via properties mixin -->

Go

doc, _ := pdfoxide.Open("book.pdf")
defer doc.Close()
labels, _ := doc.PageLabels() // returns JSON string
fmt.Println(labels)

C#

using var doc = PdfDocument.Open("book.pdf");
var labels = doc.GetPageLabels(); // returns JSON string
Console.WriteLine(labels);

Advanced Examples

Display complete document metadata

use pdf_oxide::PdfDocument;
use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;

// Basic info
let (major, minor) = doc.version();
println!("PDF Version: {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);

// XMP metadata
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    println!("\nXMP Metadata:");
    println!("  Title:       {:?}", xmp.dc_title);
    println!("  Authors:     {:?}", xmp.dc_creator);
    println!("  Description: {:?}", xmp.dc_description);
    println!("  Keywords:    {:?}", xmp.pdf_keywords);
    println!("  Creator:     {:?}", xmp.xmp_creator_tool);
    println!("  Producer:    {:?}", xmp.pdf_producer);
    println!("  Created:     {:?}", xmp.xmp_create_date);
    println!("  Modified:    {:?}", xmp.xmp_modify_date);
    println!("  Language:    {:?}", xmp.dc_language);
    println!("  Rights:      {:?}", xmp.dc_rights);

    if !xmp.custom.is_empty() {
        println!("\n  Custom properties:");
        for (key, value) in &xmp.custom {
            println!("    {}: {}", key, value);
        }
    }
}

Access raw XMP XML

use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    if let Some(xml) = &xmp.raw_xml {
        std::fs::write("metadata.xml", xml)?;
        println!("Raw XMP saved ({} bytes)", xml.len());
    }
}

Generate page number display strings

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("thesis.pdf")?;
let page_count = pdf.page_count()?;

for i in 0..page_count {
    let label = pdf.page_label(i)?;
    println!("Physical page {} -> display label '{}'", i + 1, label);
}
// Example output:
//   Physical page 1 -> display label 'i'
//   Physical page 2 -> display label 'ii'
//   Physical page 3 -> display label 'iii'
//   Physical page 4 -> display label '1'
//   Physical page 5 -> display label '2'