Skip to content

Metadata & XMP

PDF Oxide reads document-level metadata from multiple sources: the PDF header (version), the trailer and catalog dictionaries, XMP metadata streams (ISO 16684), and page label definitions. The XmpExtractor parses the Dublin Core, XMP Core, PDF, and XMP Rights namespaces, plus any custom properties.

Use version() and catalog() for basic document properties, XmpExtractor::extract() for rich metadata, and PageLabelExtractor for page numbering schemes.

Quick Example

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
major, minor = doc.version()
print(f"PDF {major}.{minor}, {doc.page_count()} pages")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("report.pdf");
const { major, minor } = doc.getVersion();
console.log(`PDF ${major}.${minor}, ${doc.pageCount()} pages`);
doc.close();

Go

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
major, minor, _ := doc.Version()
pages, _ := doc.PageCount()
fmt.Printf("PDF %d.%d, %d pages\n", major, minor, pages)

C#

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
var (major, minor) = doc.Version;
Console.WriteLine($"PDF {major}.{minor}, {doc.PageCount} pages");

WASM

const doc = new WasmPdfDocument(bytes);
const version = doc.version();
console.log(`PDF ${version}, ${doc.pageCount()} pages`);

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("report.pdf")?;
let (major, minor) = doc.version();
println!("PDF {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);

PHP

use PdfOxide\PdfDocument;

$doc = PdfDocument::open("report.pdf");
$v = $doc->version(); // ['major' => int, 'minor' => int]
echo "PDF {$v['major']}.{$v['minor']}, {$doc->pageCount()} pages\n";
$doc->close();

Ruby

require "pdf_oxide"

PdfOxide::PdfDocument.open("report.pdf") do |doc|
  puts "PDF #{doc.pdf_version}, #{doc.page_count} pages"
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");
auto v = doc.version();
std::cout << "PDF " << static_cast<int>(v.major) << "."
          << static_cast<int>(v.minor) << ", " << doc.page_count() << " pages\n";

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let v = try doc.version()
print("PDF \(v.major).\(v.minor), \(try doc.pageCount()) pages")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
final v = doc.version;
print('PDF ${v.major}.${v.minor}, ${doc.pageCount} pages');
doc.close();

R

library(pdfoxide)

doc <- pdf_open("report.pdf")
v <- pdf_version(doc)
cat(sprintf("PDF %d.%d, %d pages\n", v$major, v$minor, pdf_page_count(doc)))

Julia

using PdfOxide

doc = open_document("report.pdf")
v = version(doc)
println("PDF $(v.major).$(v.minor), $(page_count(doc)) pages")

Zig

const pdf_oxide = @import("pdf_oxide");

var doc = try pdf_oxide.Document.open("report.pdf");
const v = doc.version();
std.debug.print("PDF {d}.{d}, {d} pages\n", .{ v.major, v.minor, try doc.pageCount() });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
POXVersion v = [doc version];
printf("PDF %d.%d, %ld pages\n", v.major, v.minor, (long)[doc pageCountError:&err]);

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
%{major: maj, minor: min} = PdfOxide.version(doc)
{:ok, pages} = PdfOxide.page_count(doc)
IO.puts("PDF #{maj}.#{min}, #{pages} pages")

API Reference

version() -> (u8, u8)

Get the PDF version from the file header.

Returns: A tuple of (major, minor), e.g., (1, 7) for PDF 1.7 or (2, 0) for PDF 2.0.


catalog() -> Result<Object>

Get the document catalog dictionary. The catalog is the root of the PDF object hierarchy and contains references to the page tree, outlines, names, and other document-level structures.

Rust

let mut doc = PdfDocument::open("report.pdf")?;
let catalog = doc.catalog()?;
if let Some(dict) = catalog.as_dict() {
    for (key, _) in dict {
        println!("Catalog key: {}", key);
    }
}

trailer() -> &Object

Get the document trailer dictionary. The trailer contains the cross-reference table location, document ID, encryption dictionary reference, and info dictionary reference.

Rust

let doc = PdfDocument::open("report.pdf")?;
let trailer = doc.trailer();
println!("Trailer: {:?}", trailer);

XmpExtractor::extract(doc) -> Result<Option<XmpMetadata>>

Extract XMP (Extensible Metadata Platform) metadata from the document’s metadata stream. XMP provides richer metadata than the traditional Info dictionary, using standard XML namespaces.

Parameter Type Description
doc &mut PdfDocument The PDF document

Returns: Some(XmpMetadata) if XMP data is present, None otherwise.

XmpMetadata Fields

Dublin Core namespace (dc:)

Field Type Description
dc_title Option<String> Document title
dc_creator Vec<String> Authors/creators list
dc_description Option<String> Document description
dc_subject Vec<String> Subject keywords
dc_language Option<String> Document language (e.g., "en-US")
dc_rights Option<String> Copyright statement
dc_format Option<String> MIME format (e.g., "application/pdf")

XMP Core namespace (xmp:)

Field Type Description
xmp_creator_tool Option<String> Tool used to create the document
xmp_create_date Option<String> Creation date (ISO 8601)
xmp_modify_date Option<String> Last modification date
xmp_metadata_date Option<String> Metadata modification date

PDF namespace (pdf:)

Field Type Description
pdf_producer Option<String> PDF producer application
pdf_keywords Option<String> Keywords string
pdf_version Option<String> PDF version from XMP (may differ from header)
pdf_trapped Option<String> Trapping status

XMP Rights namespace (xmpRights:)

Field Type Description
xmp_rights_usage_terms Option<String> Usage terms
xmp_rights_marked Option<bool> Whether marked with rights
xmp_rights_web_statement Option<String> Web statement URL

Other

Field Type Description
custom HashMap<String, String> Custom properties (namespace:property to value)
raw_xml Option<String> The original XMP XML packet

Rust

use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    if let Some(title) = &xmp.dc_title {
        println!("Title: {}", title);
    }
    for creator in &xmp.dc_creator {
        println!("Author: {}", creator);
    }
    if let Some(tool) = &xmp.xmp_creator_tool {
        println!("Created with: {}", tool);
    }
    if let Some(date) = &xmp.xmp_create_date {
        println!("Created: {}", date);
    }
    if let Some(producer) = &xmp.pdf_producer {
        println!("Producer: {}", producer);
    }
}

WASM

const doc = new WasmPdfDocument(bytes);
const xmp = doc.xmpMetadata();

if (xmp) {
  console.log(`Title: ${xmp.dc_title}`);
  console.log(`Authors: ${xmp.dc_creator}`);
  console.log(`Created with: ${xmp.xmp_creator_tool}`);
  console.log(`Created: ${xmp.xmp_create_date}`);
  console.log(`Producer: ${xmp.pdf_producer}`);
}
doc.free();

Python

doc = PdfDocument("report.pdf")
xmp = doc.xmp_metadata()

if xmp:
    print(f"Title: {xmp.get('dc_title')}")
    print(f"Authors: {xmp.get('dc_creator')}")
    print(f"Created with: {xmp.get('xmp_creator_tool')}")
    print(f"Created: {xmp.get('xmp_create_date')}")
    print(f"Producer: {xmp.get('pdf_producer')}")

<!-- Node.js: no equivalent on PdfDocumentImpl — xmp metadata not exposed in js/src/index.ts -->

Go

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
xmp, _ := doc.XmpMetadata() // returns JSON string
fmt.Println(xmp)

C#

using var doc = PdfDocument.Open("report.pdf");
var xmp = doc.GetXmpMetadata(); // returns JSON string
Console.WriteLine(xmp);

C++

auto doc = pdf_oxide::Document::open("report.pdf");
std::string xmp = doc.get_xmp_metadata(); // raw XMP XML packet
std::cout << xmp << "\n";

Swift

let doc = try Document.open("report.pdf")
let xmp = try doc.xmpMetadata() // raw XMP XML packet
print(xmp)

Dart

final doc = PdfDocument.open('report.pdf');
final xmp = doc.getXmpMetadata(); // raw XMP XML packet
print(xmp);
doc.close();

R

doc <- pdf_open("report.pdf")
xmp <- pdf_get_xmp_metadata(doc) # XMP metadata as JSON
cat(xmp, "\n")

Julia

doc = open_document("report.pdf")
xmp = get_xmp_metadata(doc) # XMP metadata string
println(xmp)

Zig

var doc = try pdf_oxide.Document.open("report.pdf");
const xmp = try doc.xmpMetadata(a); // caller owns the slice
defer a.free(xmp);
std.debug.print("{s}\n", .{xmp});

Objective-C

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSString *xmp = [doc xmpMetadataWithError:&err];
printf("%s\n", xmp.UTF8String);

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
xmp = PdfOxide.xmp_metadata(doc) # XMP metadata as an XML/JSON string
IO.puts(xmp)

Pdf Convenience Methods

The high-level Pdf API provides shortcut methods for common metadata queries.

xmp_metadata() -> Result<Option<XmpMetadata>>

Get the full XMP metadata object.

xmp_title() -> Result<Option<String>>

Get just the document title from XMP.

xmp_creators() -> Result<Vec<String>>

Get the list of creators/authors from XMP.

Rust

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("report.pdf")?;

if let Some(title) = pdf.xmp_title()? {
    println!("Title: {}", title);
}

let creators = pdf.xmp_creators()?;
for creator in &creators {
    println!("Author: {}", creator);
}

PageLabelExtractor::extract(doc) -> Result<Vec<PageLabelRange>>

Extract page label definitions from the document. Page labels define how page numbers are displayed (e.g., Roman numerals for front matter, Arabic numerals for body).

Parameter Type Description
doc &mut PdfDocument The PDF document

Returns: A vector of PageLabelRange definitions.

PageLabelRange Fields

Field Type Description
start_page usize First page index this range applies to
style PageLabelStyle Numbering style
prefix Option<String> Label prefix string
start_number u32 Starting number for this range

PageLabelStyle Variants

Variant Description Example
DecimalArabic Arabic numerals 1, 2, 3
UppercaseRoman Uppercase Roman I, II, III
LowercaseRoman Lowercase Roman i, ii, iii
UppercaseLetters Uppercase letters A, B, C
LowercaseLetters Lowercase letters a, b, c
None No numbering (prefix only)

Pdf Page Label Convenience Methods

page_labels() -> Result<Vec<PageLabelRange>>

Get all page label range definitions.

page_label(page) -> Result<String>

Get the display label for a specific page index.

Rust

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("book.pdf")?;

// Get all label ranges
let ranges = pdf.page_labels()?;
for range in &ranges {
    println!(
        "Pages from {}: {:?} style, prefix={:?}, start={}",
        range.start_page, range.style, range.prefix, range.start_number
    );
}

// Get label for a specific page
let label = pdf.page_label(0)?;
println!("Page 0 label: {}", label); // e.g., "i" or "Cover"

WASM

const doc = new WasmPdfDocument(bytes);
const labels = doc.pageLabels();

for (const range of labels) {
  console.log(`Pages from ${range.start_page}: style=${range.style}, prefix=${range.prefix}`);
}
doc.free();

Python

doc = PdfDocument("book.pdf")
labels = doc.page_labels()

for range in labels:
    print(f"Pages from {range['start_page']}: style={range['style']}, prefix={range['prefix']}")

<!-- Node.js: no equivalent on PdfDocumentImpl — pageLabels not exposed on class, only via properties mixin -->

Go

doc, _ := pdfoxide.Open("book.pdf")
defer doc.Close()
labels, _ := doc.PageLabels() // returns JSON string
fmt.Println(labels)

C#

using var doc = PdfDocument.Open("book.pdf");
var labels = doc.GetPageLabels(); // returns JSON string
Console.WriteLine(labels);

C++

auto doc = pdf_oxide::Document::open("book.pdf");
std::string labels = doc.get_page_labels(); // JSON string
std::cout << labels << "\n";

Swift

let doc = try Document.open("book.pdf")
let labels = try doc.pageLabels() // JSON string
print(labels)

Dart

final doc = PdfDocument.open('book.pdf');
final labels = doc.getPageLabels(); // JSON string
print(labels);
doc.close();

R

doc <- pdf_open("book.pdf")
labels <- pdf_get_page_labels(doc) # JSON string
cat(labels, "\n")

Julia

doc = open_document("book.pdf")
labels = get_page_labels(doc) # JSON string
println(labels)

Zig

var doc = try pdf_oxide.Document.open("book.pdf");
const labels = try doc.pageLabels(a); // JSON string; caller owns the slice
defer a.free(labels);
std.debug.print("{s}\n", .{labels});

Objective-C

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *labels = [doc pageLabelsWithError:&err];
printf("%s\n", labels.UTF8String);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
labels = PdfOxide.page_labels(doc) # JSON string
IO.puts(labels)

get_producer — the document producer

The producer is the tool that generated the PDF (/Info.Producer). The editor surface exposes it as a read/write accessor: read it with get_producer / Producer, write it with the matching setter (which persists to /Info.Producer on save). The XMP equivalent is pdf_producer on XmpMetadata above.

The accessor is backed by the C ABI function document_editor_get_producer:

char *document_editor_get_producer(DocumentEditor *handle, int32_t *error_code);

It returns a caller-owned C string (free with free_string), or null when no producer is set.

Rust

use pdf_oxide::editor::DocumentEditor;

let mut editor = DocumentEditor::open("report.pdf")?;
if let Some(producer) = editor.producer()? {
    println!("Producer: {}", producer);
}

Go

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

editor, _ := pdfoxide.OpenEditor("report.pdf")
defer editor.Close()
producer, _ := editor.Producer()
fmt.Printf("Producer: %s\n", producer)

C#

using PdfOxide.Core;

using var editor = DocumentEditor.Open("report.pdf");
Console.WriteLine($"Producer: {editor.Producer}");

Swift

import PdfOxide

let editor = try DocumentEditor.open("report.pdf")
let producer = try editor.getProducer()
print("Producer: \(producer)")

PHP

use PdfOxide\DocumentEditor;

$editor = DocumentEditor::open("report.pdf");
echo "Producer: " . $editor->getProducer() . "\n";

C++

auto editor = pdf_oxide::DocumentEditor::open("report.pdf");
std::cout << "Producer: " << editor.get_producer() << "\n";

Dart

final editor = DocumentEditor.open('report.pdf');
print('Producer: ${editor.getProducer()}');

R

editor <- pdf_editor_open("report.pdf")
cat("Producer:", pdf_editor_get_producer(editor), "\n")

Julia

editor = open_editor("report.pdf")
println("Producer: ", get_producer(editor))

Zig

var editor = try pdf_oxide.Document.openEditor("report.pdf");
const producer = try editor.getProducer(a); // caller owns the slice
defer a.free(producer);
std.debug.print("Producer: {s}\n", .{producer});

Objective-C

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"report.pdf" error:&err];
NSString *producer = [editor producerError:&err];
printf("Producer: %s\n", producer.UTF8String);

Elixir

{:ok, editor} = PdfOxide.open_editor("report.pdf")
producer = PdfOxide.get_producer(editor)
IO.puts("Producer: #{producer}")

Binding coverage. get_producer lives on the editor (DocumentEditor), not the read-only PdfDocument. It is exposed in Rust (editor.producer()), Go (editor.Producer()), C# (editor.Producer property), Swift (editor.getProducer()), and the C ABI (document_editor_get_producer). The setter (set_producer / SetProducer / Producer = ...) persists changes on save. The accessor is compiled out of the WASM target.


embedded_fonts — fonts used on a page

embedded_fonts lists the fonts referenced by a page’s content stream, deriving each font’s name, embedding status, and subset status from the page’s text spans. (A subset font is detected from the standard 6-letter-prefix-plus-+ naming convention, e.g. ABCDEF+Helvetica.) It is backed by the C ABI function pdf_document_get_embedded_fonts plus the pdf_oxide_font_* accessor family.

Go

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()

fonts, _ := doc.Fonts(0) // []pdfoxide.Font
for _, f := range fonts {
    fmt.Printf("%s (%s) embedded=%v subset=%v\n",
        f.Name, f.Encoding, f.IsEmbedded, f.IsSubset)
}

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let fonts = try doc.embeddedFonts(0) // [Font]
for f in fonts {
    print("\(f.name) (\(f.encoding)) embedded=\(f.embedded) subset=\(f.subset)")
}

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiFontList *fonts = pdf_document_get_embedded_fonts(doc, /*page=*/0, &err);
int32_t n = pdf_oxide_font_count(fonts);
for (int32_t i = 0; i < n; i++) {
    char *name = pdf_oxide_font_get_name(fonts, i, &err);
    int32_t embedded = pdf_oxide_font_is_embedded(fonts, i, &err);
    int32_t subset = pdf_oxide_font_is_subset(fonts, i, &err);
    printf("%s embedded=%d subset=%d\n", name, embedded, subset);
    free_string(name);
}
pdf_oxide_font_list_free(fonts);

C++

auto doc = pdf_oxide::Document::open("report.pdf");
for (const auto& f : doc.embedded_fonts(0)) { // std::vector<Font>
    std::cout << f.name << " (" << f.encoding << ") embedded="
              << f.embedded << " subset=" << f.subset << "\n";
}

Dart

final doc = PdfDocument.open('report.pdf');
for (final f in doc.embeddedFonts(0)) { // List<Font>
  print('${f.name} (${f.encoding}) embedded=${f.embedded} subset=${f.subset}');
}
doc.close();

R

doc <- pdf_open("report.pdf")
for (f in pdf_embedded_fonts(doc, 0)) { # list of Font records
  cat(sprintf("%s (%s) embedded=%s subset=%s\n",
              f$name, f$encoding, f$embedded, f$subset))
}

Julia

doc = open_document("report.pdf")
for f in embedded_fonts(doc, 0) # Vector{Font}
    println("$(f.name) ($(f.encoding)) embedded=$(f.embedded) subset=$(f.subset)")
end

Zig

var doc = try pdf_oxide.Document.open("report.pdf");
const fonts = try doc.embeddedFonts(a, 0); // []Font
defer pdf_oxide.Document.freeFonts(a, fonts);
for (fonts) |f| {
    std.debug.print("{s} ({s}) embedded={} subset={}\n",
        .{ f.name, f.encoding, f.embedded, f.subset });
}

Objective-C

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
for (POXFont *f in [doc embeddedFonts:0 error:&err]) {
    printf("%s (%s) embedded=%d subset=%d\n",
        f.name.UTF8String, f.encoding.UTF8String, f.embedded, f.subset);
}

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
for f <- PdfOxide.embedded_fonts(doc, 0) do # list of %Font{}
  IO.puts("#{f.name} (#{f.encoding}) embedded=#{f.embedded} subset=#{f.subset}")
end

Font accessor fields

Field (Go / Swift) Type Description
Name / name string Font resource name (e.g. "ABCDEF+Helvetica")
Type / type string Font subtype
Encoding / encoding string Font encoding
IsEmbedded / embedded bool Whether the font program is embedded
IsSubset / subset bool Whether the font is subsetted
Size (Go) float32 Font size, when available

Binding coverage. embedded_fonts is exposed in Go (doc.Fonts(page)), Swift (doc.embeddedFonts(page)), and the C ABI (pdf_document_get_embedded_fonts). It is compiled out of the WASM target.


fonts_to_json — serialize a page’s fonts

fonts_to_json serializes a whole font list (from embedded_fonts) to a JSON array in a single FFI call. The Go binding uses it internally to materialize []Font; Swift exposes it directly as fontsToJson. The C ABI signature is:

char *pdf_oxide_fonts_to_json(const FfiFontList *fonts, int32_t *error_code);

The returned UTF-8 string is caller-owned (free with free_string). Its schema is:

[{"name": "...", "type": "...", "encoding": "...",
  "isEmbedded": true, "isSubset": false, "size": 0}]

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let json = try doc.fontsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiFontList *fonts = pdf_document_get_embedded_fonts(doc, /*page=*/0, &err);
char *json = pdf_oxide_fonts_to_json(fonts, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_font_list_free(fonts);

C++

auto doc = pdf_oxide::Document::open("report.pdf");
std::string json = doc.fonts_to_json(0); // JSON array string
std::cout << json << "\n";

Dart

final doc = PdfDocument.open('report.pdf');
final json = doc.embeddedFontsJson(0); // JSON array string
print(json);
doc.close();

R

doc <- pdf_open("report.pdf")
json <- pdf_fonts_to_json(doc, 0) # JSON array string
cat(json, "\n")

Julia

doc = open_document("report.pdf")
json = fonts_to_json(doc, 0) # JSON array string
println(json)

Zig

var doc = try pdf_oxide.Document.open("report.pdf");
var fl = try doc.fontList(0); // owned FontList handle
defer fl.deinit();
const json = try fl.toJson(a); // JSON array string; caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSString *json = [doc embeddedFontsJson:0 error:&err]; // JSON array string
printf("%s\n", json.UTF8String);

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
json = PdfOxide.fonts_to_json(doc, 0) # JSON array string
IO.puts(json)

Binding coverage. fonts_to_json is exposed directly in Swift (doc.fontsToJson(page)) and the C ABI (pdf_oxide_fonts_to_json); the Go binding calls it internally to decode doc.Fonts(page) into typed structs. It is compiled out of the WASM target.


Advanced Examples

Display complete document metadata

use pdf_oxide::PdfDocument;
use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;

// Basic info
let (major, minor) = doc.version();
println!("PDF Version: {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);

// XMP metadata
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    println!("\nXMP Metadata:");
    println!("  Title:       {:?}", xmp.dc_title);
    println!("  Authors:     {:?}", xmp.dc_creator);
    println!("  Description: {:?}", xmp.dc_description);
    println!("  Keywords:    {:?}", xmp.pdf_keywords);
    println!("  Creator:     {:?}", xmp.xmp_creator_tool);
    println!("  Producer:    {:?}", xmp.pdf_producer);
    println!("  Created:     {:?}", xmp.xmp_create_date);
    println!("  Modified:    {:?}", xmp.xmp_modify_date);
    println!("  Language:    {:?}", xmp.dc_language);
    println!("  Rights:      {:?}", xmp.dc_rights);

    if !xmp.custom.is_empty() {
        println!("\n  Custom properties:");
        for (key, value) in &xmp.custom {
            println!("    {}: {}", key, value);
        }
    }
}

Access raw XMP XML

use pdf_oxide::extractors::xmp::XmpExtractor;

let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
    if let Some(xml) = &xmp.raw_xml {
        std::fs::write("metadata.xml", xml)?;
        println!("Raw XMP saved ({} bytes)", xml.len());
    }
}

Generate page number display strings

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("thesis.pdf")?;
let page_count = pdf.page_count()?;

for i in 0..page_count {
    let label = pdf.page_label(i)?;
    println!("Physical page {} -> display label '{}'", i + 1, label);
}
// Example output:
//   Physical page 1 -> display label 'i'
//   Physical page 2 -> display label 'ii'
//   Physical page 3 -> display label 'iii'
//   Physical page 4 -> display label '1'
//   Physical page 5 -> display label '2'

FAQ

Where does get_producer read from? From the /Info.Producer entry in the document Info dictionary. It lives on the DocumentEditor (read/write), and the matching setter persists changes to /Info.Producer when you save. The XMP pdf:Producer value is available separately as pdf_producer on XmpMetadata.

Why does embedded_fonts only return fonts that appear in text? The font list is derived from the page’s rendered text spans, so it reflects fonts actually used to draw glyphs on that page. Subset detection follows the PDF convention of a 6-character tag plus + (e.g. ABCDEF+Helvetica).

What is the JSON schema returned by fonts_to_json? A JSON array of objects with name, type, encoding, isEmbedded, isSubset, and size fields — the same shape the Go binding unmarshals into its Font struct.

Is metadata extraction fast? Yes. PDF Oxide’s extraction core runs at roughly 0.8 ms mean / 9 ms p99 with a 100% pass rate on the benchmark corpus.