Annotation Extraction
PDF Oxide provides access to all annotation types defined in the PDF specification (ISO 32000-1:2008, Section 12.5), including text notes, hyperlinks, highlights, stamps, ink annotations, and more. The document outline (bookmarks) is also accessible for building navigation structures.
Use get_annotations() on PdfDocument for raw annotation data, or the PdfPage DOM API for a unified AnnotationWrapper interface that supports both reading and writing.
Quick Example
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
print(f"{annot.subtype}: {annot.contents}")
Node.js
const { PdfDocument } = require("pdf-oxide");
const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();
Go
import pdfoxide "github.com/yfedoseev/pdf_oxide/go"
doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}
<!-- C#: no equivalent on PdfDocument — annotations not exposed on csharp/PdfOxide/Core/PdfDocument.cs -->
WASM
const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
console.log(`${annot.subtype}: ${annot.contents}`);
}
Rust
use pdf_oxide::PdfDocument;
let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}
API Reference
get_annotations(page_index) -> Vec<Annotation>
Extract raw annotations from a specific page. Returns all annotation types present on the page.
| Parameter | Type | Description |
|---|---|---|
page_index |
usize |
Zero-based page index |
Returns: A vector of Annotation objects.
Annotation Fields
| Field | Type | Description |
|---|---|---|
annotation_type |
String |
Always "Annot" |
subtype |
Option<String> |
Raw subtype string (e.g., "Text", "Highlight") |
subtype_enum |
AnnotationSubtype |
Parsed subtype enum |
contents |
Option<String> |
Text contents of the annotation |
rect |
Option<[f64; 4]> |
Bounding rectangle [x1, y1, x2, y2] |
author |
Option<String> |
Author/creator (/T entry) |
creation_date |
Option<String> |
Creation date |
modification_date |
Option<String> |
Last modification date |
subject |
Option<String> |
Subject of the annotation |
destination |
Option<LinkDestination> |
Link destination (for Link annotations) |
action |
Option<LinkAction> |
Link action (for Link annotations) |
color |
Option<Vec<f64>> |
Annotation color components |
flags |
Option<AnnotationFlags> |
Annotation flags (invisible, hidden, print, etc.) |
AnnotationSubtype Variants
| Variant | Description |
|---|---|
Text |
Sticky note annotation |
Link |
Hyperlink annotation |
FreeText |
Text box annotation |
Line |
Line shape annotation |
Square |
Rectangle shape annotation |
Circle |
Ellipse shape annotation |
Polygon |
Polygon shape annotation |
PolyLine |
Polyline shape annotation |
Highlight |
Text highlight markup |
Underline |
Text underline markup |
Squiggly |
Squiggly underline markup |
StrikeOut |
Strikethrough markup |
Stamp |
Rubber stamp annotation |
Ink |
Freehand drawing annotation |
Popup |
Pop-up note associated with another annotation |
FileAttachment |
Embedded file annotation |
Sound |
Sound annotation |
Movie |
Movie annotation |
Screen |
Screen annotation |
Widget |
Form field widget |
PrinterMark |
Printer’s mark annotation |
TrapNet |
Trap network annotation |
Watermark |
Watermark annotation |
ThreeDimensional |
3D annotation |
Redact |
Redaction annotation |
Caret |
Caret annotation (insertion point) |
RichMedia |
Rich media annotation |
Unknown |
Unrecognized annotation type |
get_outline() -> Option<Vec<OutlineItem>>
Get the document outline (bookmarks) if present. Returns a hierarchical tree of outline items that can be used for document navigation.
Returns:
Some(Vec<OutlineItem>)– Bookmarks found and parsedNone– No bookmarks in the document
OutlineItem Fields
| Field | Type | Description |
|---|---|---|
title |
String |
Bookmark title text |
dest |
Option<Destination> |
Navigation destination |
children |
Vec<OutlineItem> |
Nested child bookmarks |
Destination Variants
| Variant | Description |
|---|---|
PageIndex(usize) |
Direct page reference (0-based index) |
Named(String) |
Named destination identifier |
Rust
let mut doc = PdfDocument::open("book.pdf")?;
if let Some(outline) = doc.get_outline()? {
for item in &outline {
println!(" {}", item.title);
for child in &item.children {
println!(" {}", child.title);
}
}
} else {
println!("No bookmarks found.");
}
PdfPage Annotation API (DOM)
The PdfPage object from the DocumentEditor provides a higher-level AnnotationWrapper interface that supports both reading existing annotations and adding new ones.
page.annotations() -> &[AnnotationWrapper]
Get all annotations on the page as wrapped objects.
page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>
Find annotations of a specific type.
page.add_annotation(annotation)
Add a new annotation to the page.
page.remove_annotation(index) -> Option<AnnotationWrapper>
Remove an annotation by index.
page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>
Find annotations whose bounding boxes intersect a given region.
AnnotationWrapper Methods
| Method | Returns | Description |
|---|---|---|
id() |
AnnotationId |
Unique session ID |
subtype() |
AnnotationSubtype |
Annotation type |
rect() |
Rect |
Bounding rectangle |
contents() |
Option<&str> |
Text contents |
color() |
Option<(f32, f32, f32)> |
RGB color (0.0–1.0) |
is_modified() |
bool |
Whether annotation has been changed |
Python
doc = PdfDocument("annotated.pdf")
page = doc.page(0)
# List all annotations
for annot in page.annotations():
print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")
# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")
Node.js
const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
// List all annotations
for (const annot of annotations) {
console.log(`[${annot.subtype}] ${annot.contents}`);
}
// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();
Go
doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
// List all annotations
for _, annot := range annotations {
fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}
// Find highlights
highlights := 0
for _, a := range annotations {
if a.Subtype == "Highlight" {
highlights++
}
}
fmt.Printf("Found %d highlights\n", highlights)
<!-- C#: no equivalent on PdfDocument — annotations not exposed on csharp/PdfOxide/Core/PdfDocument.cs -->
WASM
const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
// List all annotations
for (const annot of annotations) {
console.log(`[${annot.subtype}] ${annot.contents}`);
}
// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
Rust
use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;
let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;
// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}
Advanced Examples
Build a table of contents from bookmarks
use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;
let mut doc = PdfDocument::open("book.pdf")?;
fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
for item in items {
let indent = " ".repeat(depth);
let page = match &item.dest {
Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
Some(Destination::Named(n)) => format!("dest '{}'", n),
None => "no dest".to_string(),
};
println!("{}{} ({})", indent, item.title, page);
print_toc(&item.children, depth + 1);
}
}
if let Some(outline) = doc.get_outline()? {
println!("Table of Contents:");
print_toc(&outline, 0);
}
Extract all comments (Text annotations)
use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;
let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;
for page_idx in 0..page_count {
let annotations = doc.get_annotations(page_idx)?;
let comments: Vec<_> = annotations.iter()
.filter(|a| a.subtype_enum == AnnotationSubtype::Text)
.collect();
if !comments.is_empty() {
println!("Page {}:", page_idx + 1);
for c in &comments {
let author = c.author.as_deref().unwrap_or("Unknown");
let text = c.contents.as_deref().unwrap_or("");
println!(" [{}] {}", author, text);
}
}
}
Extract all hyperlinks
use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;
let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;
let links: Vec<_> = annotations.iter()
.filter(|a| a.subtype_enum == AnnotationSubtype::Link)
.collect();
for link in &links {
if let Some(ref action) = link.action {
println!("Link: {:?}", action);
}
if let Some(ref dest) = link.destination {
println!("Internal link: {:?}", dest);
}
}
Related Pages
- Form Data Extraction – Extract form fields (Widget annotations)
- Text Extraction – Extract text content from pages
- Metadata & XMP – Read document properties and bookmarks