Metadata & XMP
PDF Oxide reads document-level metadata from multiple sources: the PDF header (version), the trailer and catalog dictionaries, XMP metadata streams (ISO 16684), and page label definitions. The XmpExtractor parses the Dublin Core, XMP Core, PDF, and XMP Rights namespaces, plus any custom properties.
Use version() and catalog() for basic document properties, XmpExtractor::extract() for rich metadata, and PageLabelExtractor for page numbering schemes.
Quick Example
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
major, minor = doc.version()
print(f"PDF {major}.{minor}, {doc.page_count()} pages")
Node.js
const { PdfDocument } = require("pdf-oxide");
const doc = new PdfDocument("report.pdf");
const { major, minor } = doc.getVersion();
console.log(`PDF ${major}.${minor}, ${doc.pageCount()} pages`);
doc.close();
Go
import pdfoxide "github.com/yfedoseev/pdf_oxide/go"
doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
major, minor, _ := doc.Version()
pages, _ := doc.PageCount()
fmt.Printf("PDF %d.%d, %d pages\n", major, minor, pages)
C#
using PdfOxide.Core;
using var doc = PdfDocument.Open("report.pdf");
var (major, minor) = doc.Version;
Console.WriteLine($"PDF {major}.{minor}, {doc.PageCount} pages");
WASM
const doc = new WasmPdfDocument(bytes);
const version = doc.version();
console.log(`PDF ${version}, ${doc.pageCount()} pages`);
Rust
use pdf_oxide::PdfDocument;
let mut doc = PdfDocument::open("report.pdf")?;
let (major, minor) = doc.version();
println!("PDF {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);
PHP
use PdfOxide\PdfDocument;
$doc = PdfDocument::open("report.pdf");
$v = $doc->version(); // ['major' => int, 'minor' => int]
echo "PDF {$v['major']}.{$v['minor']}, {$doc->pageCount()} pages\n";
$doc->close();
Ruby
require "pdf_oxide"
PdfOxide::PdfDocument.open("report.pdf") do |doc|
puts "PDF #{doc.pdf_version}, #{doc.page_count} pages"
end
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto doc = pdf_oxide::Document::open("report.pdf");
auto v = doc.version();
std::cout << "PDF " << static_cast<int>(v.major) << "."
<< static_cast<int>(v.minor) << ", " << doc.page_count() << " pages\n";
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
let v = try doc.version()
print("PDF \(v.major).\(v.minor), \(try doc.pageCount()) pages")
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final doc = PdfDocument.open('report.pdf');
final v = doc.version;
print('PDF ${v.major}.${v.minor}, ${doc.pageCount} pages');
doc.close();
R
library(pdfoxide)
doc <- pdf_open("report.pdf")
v <- pdf_version(doc)
cat(sprintf("PDF %d.%d, %d pages\n", v$major, v$minor, pdf_page_count(doc)))
Julia
using PdfOxide
doc = open_document("report.pdf")
v = version(doc)
println("PDF $(v.major).$(v.minor), $(page_count(doc)) pages")
Zig
const pdf_oxide = @import("pdf_oxide");
var doc = try pdf_oxide.Document.open("report.pdf");
const v = doc.version();
std.debug.print("PDF {d}.{d}, {d} pages\n", .{ v.major, v.minor, try doc.pageCount() });
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
POXVersion v = [doc version];
printf("PDF %d.%d, %ld pages\n", v.major, v.minor, (long)[doc pageCountError:&err]);
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
%{major: maj, minor: min} = PdfOxide.version(doc)
{:ok, pages} = PdfOxide.page_count(doc)
IO.puts("PDF #{maj}.#{min}, #{pages} pages")
API Reference
version() -> (u8, u8)
Get the PDF version from the file header.
Returns: A tuple of (major, minor), e.g., (1, 7) for PDF 1.7 or (2, 0) for PDF 2.0.
catalog() -> Result<Object>
Get the document catalog dictionary. The catalog is the root of the PDF object hierarchy and contains references to the page tree, outlines, names, and other document-level structures.
Rust
let mut doc = PdfDocument::open("report.pdf")?;
let catalog = doc.catalog()?;
if let Some(dict) = catalog.as_dict() {
for (key, _) in dict {
println!("Catalog key: {}", key);
}
}
trailer() -> &Object
Get the document trailer dictionary. The trailer contains the cross-reference table location, document ID, encryption dictionary reference, and info dictionary reference.
Rust
let doc = PdfDocument::open("report.pdf")?;
let trailer = doc.trailer();
println!("Trailer: {:?}", trailer);
XmpExtractor::extract(doc) -> Result<Option<XmpMetadata>>
Extract XMP (Extensible Metadata Platform) metadata from the document’s metadata stream. XMP provides richer metadata than the traditional Info dictionary, using standard XML namespaces.
| Parameter | Type | Description |
|---|---|---|
doc |
&mut PdfDocument |
The PDF document |
Returns: Some(XmpMetadata) if XMP data is present, None otherwise.
XmpMetadata Fields
Dublin Core namespace (dc:)
| Field | Type | Description |
|---|---|---|
dc_title |
Option<String> |
Document title |
dc_creator |
Vec<String> |
Authors/creators list |
dc_description |
Option<String> |
Document description |
dc_subject |
Vec<String> |
Subject keywords |
dc_language |
Option<String> |
Document language (e.g., "en-US") |
dc_rights |
Option<String> |
Copyright statement |
dc_format |
Option<String> |
MIME format (e.g., "application/pdf") |
XMP Core namespace (xmp:)
| Field | Type | Description |
|---|---|---|
xmp_creator_tool |
Option<String> |
Tool used to create the document |
xmp_create_date |
Option<String> |
Creation date (ISO 8601) |
xmp_modify_date |
Option<String> |
Last modification date |
xmp_metadata_date |
Option<String> |
Metadata modification date |
PDF namespace (pdf:)
| Field | Type | Description |
|---|---|---|
pdf_producer |
Option<String> |
PDF producer application |
pdf_keywords |
Option<String> |
Keywords string |
pdf_version |
Option<String> |
PDF version from XMP (may differ from header) |
pdf_trapped |
Option<String> |
Trapping status |
XMP Rights namespace (xmpRights:)
| Field | Type | Description |
|---|---|---|
xmp_rights_usage_terms |
Option<String> |
Usage terms |
xmp_rights_marked |
Option<bool> |
Whether marked with rights |
xmp_rights_web_statement |
Option<String> |
Web statement URL |
Other
| Field | Type | Description |
|---|---|---|
custom |
HashMap<String, String> |
Custom properties (namespace:property to value) |
raw_xml |
Option<String> |
The original XMP XML packet |
Rust
use pdf_oxide::extractors::xmp::XmpExtractor;
let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
if let Some(title) = &xmp.dc_title {
println!("Title: {}", title);
}
for creator in &xmp.dc_creator {
println!("Author: {}", creator);
}
if let Some(tool) = &xmp.xmp_creator_tool {
println!("Created with: {}", tool);
}
if let Some(date) = &xmp.xmp_create_date {
println!("Created: {}", date);
}
if let Some(producer) = &xmp.pdf_producer {
println!("Producer: {}", producer);
}
}
WASM
const doc = new WasmPdfDocument(bytes);
const xmp = doc.xmpMetadata();
if (xmp) {
console.log(`Title: ${xmp.dc_title}`);
console.log(`Authors: ${xmp.dc_creator}`);
console.log(`Created with: ${xmp.xmp_creator_tool}`);
console.log(`Created: ${xmp.xmp_create_date}`);
console.log(`Producer: ${xmp.pdf_producer}`);
}
doc.free();
Python
doc = PdfDocument("report.pdf")
xmp = doc.xmp_metadata()
if xmp:
print(f"Title: {xmp.get('dc_title')}")
print(f"Authors: {xmp.get('dc_creator')}")
print(f"Created with: {xmp.get('xmp_creator_tool')}")
print(f"Created: {xmp.get('xmp_create_date')}")
print(f"Producer: {xmp.get('pdf_producer')}")
<!-- Node.js: no equivalent on PdfDocumentImpl — xmp metadata not exposed in js/src/index.ts -->
Go
doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
xmp, _ := doc.XmpMetadata() // returns JSON string
fmt.Println(xmp)
C#
using var doc = PdfDocument.Open("report.pdf");
var xmp = doc.GetXmpMetadata(); // returns JSON string
Console.WriteLine(xmp);
C++
auto doc = pdf_oxide::Document::open("report.pdf");
std::string xmp = doc.get_xmp_metadata(); // raw XMP XML packet
std::cout << xmp << "\n";
Swift
let doc = try Document.open("report.pdf")
let xmp = try doc.xmpMetadata() // raw XMP XML packet
print(xmp)
Dart
final doc = PdfDocument.open('report.pdf');
final xmp = doc.getXmpMetadata(); // raw XMP XML packet
print(xmp);
doc.close();
R
doc <- pdf_open("report.pdf")
xmp <- pdf_get_xmp_metadata(doc) # XMP metadata as JSON
cat(xmp, "\n")
Julia
doc = open_document("report.pdf")
xmp = get_xmp_metadata(doc) # XMP metadata string
println(xmp)
Zig
var doc = try pdf_oxide.Document.open("report.pdf");
const xmp = try doc.xmpMetadata(a); // caller owns the slice
defer a.free(xmp);
std.debug.print("{s}\n", .{xmp});
Objective-C
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSString *xmp = [doc xmpMetadataWithError:&err];
printf("%s\n", xmp.UTF8String);
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
xmp = PdfOxide.xmp_metadata(doc) # XMP metadata as an XML/JSON string
IO.puts(xmp)
Pdf Convenience Methods
The high-level Pdf API provides shortcut methods for common metadata queries.
xmp_metadata() -> Result<Option<XmpMetadata>>
Get the full XMP metadata object.
xmp_title() -> Result<Option<String>>
Get just the document title from XMP.
xmp_creators() -> Result<Vec<String>>
Get the list of creators/authors from XMP.
Rust
use pdf_oxide::api::Pdf;
let mut pdf = Pdf::open("report.pdf")?;
if let Some(title) = pdf.xmp_title()? {
println!("Title: {}", title);
}
let creators = pdf.xmp_creators()?;
for creator in &creators {
println!("Author: {}", creator);
}
PageLabelExtractor::extract(doc) -> Result<Vec<PageLabelRange>>
Extract page label definitions from the document. Page labels define how page numbers are displayed (e.g., Roman numerals for front matter, Arabic numerals for body).
| Parameter | Type | Description |
|---|---|---|
doc |
&mut PdfDocument |
The PDF document |
Returns: A vector of PageLabelRange definitions.
PageLabelRange Fields
| Field | Type | Description |
|---|---|---|
start_page |
usize |
First page index this range applies to |
style |
PageLabelStyle |
Numbering style |
prefix |
Option<String> |
Label prefix string |
start_number |
u32 |
Starting number for this range |
PageLabelStyle Variants
| Variant | Description | Example |
|---|---|---|
DecimalArabic |
Arabic numerals | 1, 2, 3 |
UppercaseRoman |
Uppercase Roman | I, II, III |
LowercaseRoman |
Lowercase Roman | i, ii, iii |
UppercaseLetters |
Uppercase letters | A, B, C |
LowercaseLetters |
Lowercase letters | a, b, c |
None |
No numbering (prefix only) | – |
Pdf Page Label Convenience Methods
page_labels() -> Result<Vec<PageLabelRange>>
Get all page label range definitions.
page_label(page) -> Result<String>
Get the display label for a specific page index.
Rust
use pdf_oxide::api::Pdf;
let mut pdf = Pdf::open("book.pdf")?;
// Get all label ranges
let ranges = pdf.page_labels()?;
for range in &ranges {
println!(
"Pages from {}: {:?} style, prefix={:?}, start={}",
range.start_page, range.style, range.prefix, range.start_number
);
}
// Get label for a specific page
let label = pdf.page_label(0)?;
println!("Page 0 label: {}", label); // e.g., "i" or "Cover"
WASM
const doc = new WasmPdfDocument(bytes);
const labels = doc.pageLabels();
for (const range of labels) {
console.log(`Pages from ${range.start_page}: style=${range.style}, prefix=${range.prefix}`);
}
doc.free();
Python
doc = PdfDocument("book.pdf")
labels = doc.page_labels()
for range in labels:
print(f"Pages from {range['start_page']}: style={range['style']}, prefix={range['prefix']}")
<!-- Node.js: no equivalent on PdfDocumentImpl — pageLabels not exposed on class, only via properties mixin -->
Go
doc, _ := pdfoxide.Open("book.pdf")
defer doc.Close()
labels, _ := doc.PageLabels() // returns JSON string
fmt.Println(labels)
C#
using var doc = PdfDocument.Open("book.pdf");
var labels = doc.GetPageLabels(); // returns JSON string
Console.WriteLine(labels);
C++
auto doc = pdf_oxide::Document::open("book.pdf");
std::string labels = doc.get_page_labels(); // JSON string
std::cout << labels << "\n";
Swift
let doc = try Document.open("book.pdf")
let labels = try doc.pageLabels() // JSON string
print(labels)
Dart
final doc = PdfDocument.open('book.pdf');
final labels = doc.getPageLabels(); // JSON string
print(labels);
doc.close();
R
doc <- pdf_open("book.pdf")
labels <- pdf_get_page_labels(doc) # JSON string
cat(labels, "\n")
Julia
doc = open_document("book.pdf")
labels = get_page_labels(doc) # JSON string
println(labels)
Zig
var doc = try pdf_oxide.Document.open("book.pdf");
const labels = try doc.pageLabels(a); // JSON string; caller owns the slice
defer a.free(labels);
std.debug.print("{s}\n", .{labels});
Objective-C
POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *labels = [doc pageLabelsWithError:&err];
printf("%s\n", labels.UTF8String);
Elixir
{:ok, doc} = PdfOxide.open("book.pdf")
labels = PdfOxide.page_labels(doc) # JSON string
IO.puts(labels)
get_producer — the document producer
The producer is the tool that generated the PDF (/Info.Producer). The editor surface exposes it as a read/write accessor: read it with get_producer / Producer, write it with the matching setter (which persists to /Info.Producer on save). The XMP equivalent is pdf_producer on XmpMetadata above.
The accessor is backed by the C ABI function document_editor_get_producer:
char *document_editor_get_producer(DocumentEditor *handle, int32_t *error_code);
It returns a caller-owned C string (free with free_string), or null when no producer is set.
Rust
use pdf_oxide::editor::DocumentEditor;
let mut editor = DocumentEditor::open("report.pdf")?;
if let Some(producer) = editor.producer()? {
println!("Producer: {}", producer);
}
Go
import pdfoxide "github.com/yfedoseev/pdf_oxide/go"
editor, _ := pdfoxide.OpenEditor("report.pdf")
defer editor.Close()
producer, _ := editor.Producer()
fmt.Printf("Producer: %s\n", producer)
C#
using PdfOxide.Core;
using var editor = DocumentEditor.Open("report.pdf");
Console.WriteLine($"Producer: {editor.Producer}");
Swift
import PdfOxide
let editor = try DocumentEditor.open("report.pdf")
let producer = try editor.getProducer()
print("Producer: \(producer)")
PHP
use PdfOxide\DocumentEditor;
$editor = DocumentEditor::open("report.pdf");
echo "Producer: " . $editor->getProducer() . "\n";
C++
auto editor = pdf_oxide::DocumentEditor::open("report.pdf");
std::cout << "Producer: " << editor.get_producer() << "\n";
Dart
final editor = DocumentEditor.open('report.pdf');
print('Producer: ${editor.getProducer()}');
R
editor <- pdf_editor_open("report.pdf")
cat("Producer:", pdf_editor_get_producer(editor), "\n")
Julia
editor = open_editor("report.pdf")
println("Producer: ", get_producer(editor))
Zig
var editor = try pdf_oxide.Document.openEditor("report.pdf");
const producer = try editor.getProducer(a); // caller owns the slice
defer a.free(producer);
std.debug.print("Producer: {s}\n", .{producer});
Objective-C
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"report.pdf" error:&err];
NSString *producer = [editor producerError:&err];
printf("Producer: %s\n", producer.UTF8String);
Elixir
{:ok, editor} = PdfOxide.open_editor("report.pdf")
producer = PdfOxide.get_producer(editor)
IO.puts("Producer: #{producer}")
Binding coverage.
get_producerlives on the editor (DocumentEditor), not the read-onlyPdfDocument. It is exposed in Rust (editor.producer()), Go (editor.Producer()), C# (editor.Producerproperty), Swift (editor.getProducer()), and the C ABI (document_editor_get_producer). The setter (set_producer/SetProducer/Producer = ...) persists changes on save. The accessor is compiled out of the WASM target.
embedded_fonts — fonts used on a page
embedded_fonts lists the fonts referenced by a page’s content stream, deriving each font’s name, embedding status, and subset status from the page’s text spans. (A subset font is detected from the standard 6-letter-prefix-plus-+ naming convention, e.g. ABCDEF+Helvetica.) It is backed by the C ABI function pdf_document_get_embedded_fonts plus the pdf_oxide_font_* accessor family.
Go
import pdfoxide "github.com/yfedoseev/pdf_oxide/go"
doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
fonts, _ := doc.Fonts(0) // []pdfoxide.Font
for _, f := range fonts {
fmt.Printf("%s (%s) embedded=%v subset=%v\n",
f.Name, f.Encoding, f.IsEmbedded, f.IsSubset)
}
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
let fonts = try doc.embeddedFonts(0) // [Font]
for f in fonts {
print("\(f.name) (\(f.encoding)) embedded=\(f.embedded) subset=\(f.subset)")
}
C ABI
#include "pdf_oxide.h"
int32_t err = 0;
FfiFontList *fonts = pdf_document_get_embedded_fonts(doc, /*page=*/0, &err);
int32_t n = pdf_oxide_font_count(fonts);
for (int32_t i = 0; i < n; i++) {
char *name = pdf_oxide_font_get_name(fonts, i, &err);
int32_t embedded = pdf_oxide_font_is_embedded(fonts, i, &err);
int32_t subset = pdf_oxide_font_is_subset(fonts, i, &err);
printf("%s embedded=%d subset=%d\n", name, embedded, subset);
free_string(name);
}
pdf_oxide_font_list_free(fonts);
C++
auto doc = pdf_oxide::Document::open("report.pdf");
for (const auto& f : doc.embedded_fonts(0)) { // std::vector<Font>
std::cout << f.name << " (" << f.encoding << ") embedded="
<< f.embedded << " subset=" << f.subset << "\n";
}
Dart
final doc = PdfDocument.open('report.pdf');
for (final f in doc.embeddedFonts(0)) { // List<Font>
print('${f.name} (${f.encoding}) embedded=${f.embedded} subset=${f.subset}');
}
doc.close();
R
doc <- pdf_open("report.pdf")
for (f in pdf_embedded_fonts(doc, 0)) { # list of Font records
cat(sprintf("%s (%s) embedded=%s subset=%s\n",
f$name, f$encoding, f$embedded, f$subset))
}
Julia
doc = open_document("report.pdf")
for f in embedded_fonts(doc, 0) # Vector{Font}
println("$(f.name) ($(f.encoding)) embedded=$(f.embedded) subset=$(f.subset)")
end
Zig
var doc = try pdf_oxide.Document.open("report.pdf");
const fonts = try doc.embeddedFonts(a, 0); // []Font
defer pdf_oxide.Document.freeFonts(a, fonts);
for (fonts) |f| {
std.debug.print("{s} ({s}) embedded={} subset={}\n",
.{ f.name, f.encoding, f.embedded, f.subset });
}
Objective-C
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
for (POXFont *f in [doc embeddedFonts:0 error:&err]) {
printf("%s (%s) embedded=%d subset=%d\n",
f.name.UTF8String, f.encoding.UTF8String, f.embedded, f.subset);
}
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
for f <- PdfOxide.embedded_fonts(doc, 0) do # list of %Font{}
IO.puts("#{f.name} (#{f.encoding}) embedded=#{f.embedded} subset=#{f.subset}")
end
Font accessor fields
| Field (Go / Swift) | Type | Description |
|---|---|---|
Name / name |
string |
Font resource name (e.g. "ABCDEF+Helvetica") |
Type / type |
string |
Font subtype |
Encoding / encoding |
string |
Font encoding |
IsEmbedded / embedded |
bool |
Whether the font program is embedded |
IsSubset / subset |
bool |
Whether the font is subsetted |
Size (Go) |
float32 |
Font size, when available |
Binding coverage.
embedded_fontsis exposed in Go (doc.Fonts(page)), Swift (doc.embeddedFonts(page)), and the C ABI (pdf_document_get_embedded_fonts). It is compiled out of the WASM target.
fonts_to_json — serialize a page’s fonts
fonts_to_json serializes a whole font list (from embedded_fonts) to a JSON array in a single FFI call. The Go binding uses it internally to materialize []Font; Swift exposes it directly as fontsToJson. The C ABI signature is:
char *pdf_oxide_fonts_to_json(const FfiFontList *fonts, int32_t *error_code);
The returned UTF-8 string is caller-owned (free with free_string). Its schema is:
[{"name": "...", "type": "...", "encoding": "...",
"isEmbedded": true, "isSubset": false, "size": 0}]
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
let json = try doc.fontsToJson(0) // String of JSON
print(json)
C ABI
#include "pdf_oxide.h"
int32_t err = 0;
FfiFontList *fonts = pdf_document_get_embedded_fonts(doc, /*page=*/0, &err);
char *json = pdf_oxide_fonts_to_json(fonts, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_font_list_free(fonts);
C++
auto doc = pdf_oxide::Document::open("report.pdf");
std::string json = doc.fonts_to_json(0); // JSON array string
std::cout << json << "\n";
Dart
final doc = PdfDocument.open('report.pdf');
final json = doc.embeddedFontsJson(0); // JSON array string
print(json);
doc.close();
R
doc <- pdf_open("report.pdf")
json <- pdf_fonts_to_json(doc, 0) # JSON array string
cat(json, "\n")
Julia
doc = open_document("report.pdf")
json = fonts_to_json(doc, 0) # JSON array string
println(json)
Zig
var doc = try pdf_oxide.Document.open("report.pdf");
var fl = try doc.fontList(0); // owned FontList handle
defer fl.deinit();
const json = try fl.toJson(a); // JSON array string; caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});
Objective-C
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSString *json = [doc embeddedFontsJson:0 error:&err]; // JSON array string
printf("%s\n", json.UTF8String);
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
json = PdfOxide.fonts_to_json(doc, 0) # JSON array string
IO.puts(json)
Binding coverage.
fonts_to_jsonis exposed directly in Swift (doc.fontsToJson(page)) and the C ABI (pdf_oxide_fonts_to_json); the Go binding calls it internally to decodedoc.Fonts(page)into typed structs. It is compiled out of the WASM target.
Advanced Examples
Display complete document metadata
use pdf_oxide::PdfDocument;
use pdf_oxide::extractors::xmp::XmpExtractor;
let mut doc = PdfDocument::open("report.pdf")?;
// Basic info
let (major, minor) = doc.version();
println!("PDF Version: {}.{}", major, minor);
println!("Pages: {}", doc.page_count()?);
// XMP metadata
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
println!("\nXMP Metadata:");
println!(" Title: {:?}", xmp.dc_title);
println!(" Authors: {:?}", xmp.dc_creator);
println!(" Description: {:?}", xmp.dc_description);
println!(" Keywords: {:?}", xmp.pdf_keywords);
println!(" Creator: {:?}", xmp.xmp_creator_tool);
println!(" Producer: {:?}", xmp.pdf_producer);
println!(" Created: {:?}", xmp.xmp_create_date);
println!(" Modified: {:?}", xmp.xmp_modify_date);
println!(" Language: {:?}", xmp.dc_language);
println!(" Rights: {:?}", xmp.dc_rights);
if !xmp.custom.is_empty() {
println!("\n Custom properties:");
for (key, value) in &xmp.custom {
println!(" {}: {}", key, value);
}
}
}
Access raw XMP XML
use pdf_oxide::extractors::xmp::XmpExtractor;
let mut doc = PdfDocument::open("report.pdf")?;
if let Some(xmp) = XmpExtractor::extract(&mut doc)? {
if let Some(xml) = &xmp.raw_xml {
std::fs::write("metadata.xml", xml)?;
println!("Raw XMP saved ({} bytes)", xml.len());
}
}
Generate page number display strings
use pdf_oxide::api::Pdf;
let mut pdf = Pdf::open("thesis.pdf")?;
let page_count = pdf.page_count()?;
for i in 0..page_count {
let label = pdf.page_label(i)?;
println!("Physical page {} -> display label '{}'", i + 1, label);
}
// Example output:
// Physical page 1 -> display label 'i'
// Physical page 2 -> display label 'ii'
// Physical page 3 -> display label 'iii'
// Physical page 4 -> display label '1'
// Physical page 5 -> display label '2'
FAQ
Where does get_producer read from?
From the /Info.Producer entry in the document Info dictionary. It lives on the DocumentEditor (read/write), and the matching setter persists changes to /Info.Producer when you save. The XMP pdf:Producer value is available separately as pdf_producer on XmpMetadata.
Why does embedded_fonts only return fonts that appear in text?
The font list is derived from the page’s rendered text spans, so it reflects fonts actually used to draw glyphs on that page. Subset detection follows the PDF convention of a 6-character tag plus + (e.g. ABCDEF+Helvetica).
What is the JSON schema returned by fonts_to_json?
A JSON array of objects with name, type, encoding, isEmbedded, isSubset, and size fields — the same shape the Go binding unmarshals into its Font struct.
Is metadata extraction fast? Yes. PDF Oxide’s extraction core runs at roughly 0.8 ms mean / 9 ms p99 with a 100% pass rate on the benchmark corpus.
Related Pages
- Text Extraction – Extract text content from pages
- Annotation Extraction – Access bookmarks and annotations
- Form Data Extraction – Extract form field data
- Image Extraction – Embedded images and page-elements accessor