Skip to content

Annotation Extraction

PDF Oxide provides access to all annotation types defined in the PDF specification (ISO 32000-1:2008, Section 12.5), including text notes, hyperlinks, highlights, stamps, ink annotations, and more. The document outline (bookmarks) is also accessible for building navigation structures.

Use get_annotations() on PdfDocument for raw annotation data, or the PdfPage DOM API for a unified AnnotationWrapper interface that supports both reading and writing.

Quick Example

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

Go

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

<!-- C#: no equivalent on PdfDocument — annotations not exposed on csharp/PdfOxide/Core/PdfDocument.cs -->

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

API Reference

get_annotations(page_index) -> Vec<Annotation>

Extract raw annotations from a specific page. Returns all annotation types present on the page.

Parameter Type Description
page_index usize Zero-based page index

Returns: A vector of Annotation objects.

Annotation Fields

Field Type Description
annotation_type String Always "Annot"
subtype Option<String> Raw subtype string (e.g., "Text", "Highlight")
subtype_enum AnnotationSubtype Parsed subtype enum
contents Option<String> Text contents of the annotation
rect Option<[f64; 4]> Bounding rectangle [x1, y1, x2, y2]
author Option<String> Author/creator (/T entry)
creation_date Option<String> Creation date
modification_date Option<String> Last modification date
subject Option<String> Subject of the annotation
destination Option<LinkDestination> Link destination (for Link annotations)
action Option<LinkAction> Link action (for Link annotations)
color Option<Vec<f64>> Annotation color components
flags Option<AnnotationFlags> Annotation flags (invisible, hidden, print, etc.)

AnnotationSubtype Variants

Variant Description
Text Sticky note annotation
Link Hyperlink annotation
FreeText Text box annotation
Line Line shape annotation
Square Rectangle shape annotation
Circle Ellipse shape annotation
Polygon Polygon shape annotation
PolyLine Polyline shape annotation
Highlight Text highlight markup
Underline Text underline markup
Squiggly Squiggly underline markup
StrikeOut Strikethrough markup
Stamp Rubber stamp annotation
Ink Freehand drawing annotation
Popup Pop-up note associated with another annotation
FileAttachment Embedded file annotation
Sound Sound annotation
Movie Movie annotation
Screen Screen annotation
Widget Form field widget
PrinterMark Printer’s mark annotation
TrapNet Trap network annotation
Watermark Watermark annotation
ThreeDimensional 3D annotation
Redact Redaction annotation
Caret Caret annotation (insertion point)
RichMedia Rich media annotation
Unknown Unrecognized annotation type

get_outline() -> Option<Vec<OutlineItem>>

Get the document outline (bookmarks) if present. Returns a hierarchical tree of outline items that can be used for document navigation.

Returns:

  • Some(Vec<OutlineItem>) – Bookmarks found and parsed
  • None – No bookmarks in the document

OutlineItem Fields

Field Type Description
title String Bookmark title text
dest Option<Destination> Navigation destination
children Vec<OutlineItem> Nested child bookmarks

Destination Variants

Variant Description
PageIndex(usize) Direct page reference (0-based index)
Named(String) Named destination identifier

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

PdfPage Annotation API (DOM)

The PdfPage object from the DocumentEditor provides a higher-level AnnotationWrapper interface that supports both reading existing annotations and adding new ones.

page.annotations() -> &[AnnotationWrapper]

Get all annotations on the page as wrapped objects.

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

Find annotations of a specific type.

page.add_annotation(annotation)

Add a new annotation to the page.

page.remove_annotation(index) -> Option<AnnotationWrapper>

Remove an annotation by index.

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

Find annotations whose bounding boxes intersect a given region.

AnnotationWrapper Methods

Method Returns Description
id() AnnotationId Unique session ID
subtype() AnnotationSubtype Annotation type
rect() Rect Bounding rectangle
contents() Option<&str> Text contents
color() Option<(f32, f32, f32)> RGB color (0.0–1.0)
is_modified() bool Whether annotation has been changed

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

Go

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

<!-- C#: no equivalent on PdfDocument — annotations not exposed on csharp/PdfOxide/Core/PdfDocument.cs -->

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Advanced Examples

Build a table of contents from bookmarks

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

Extract all comments (Text annotations)

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}
use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}