What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Annotationsextraktion

PDF Oxide bietet Zugriff auf alle in der PDF-Spezifikation (ISO 32000-1:2008, Abschnitt 12.5) definierten Annotationstypen, darunter Textnotizen, Hyperlinks, Hervorhebungen, Stempel, Freihandannotationen und mehr. Auch die Dokumentgliederung (Lesezeichen) ist zugänglich, um Navigationsstrukturen aufzubauen.

Verwenden Sie get_annotations() auf PdfDocument für rohe Annotationsdaten oder die DOM-API von PdfPage für ein einheitliches AnnotationWrapper-Interface, das sowohl Lesen als auch Schreiben unterstützt.

Binding-Abdeckung. Die Annotationsextraktion ist in Python (doc.get_annotations(page)), Rust (doc.get_annotations(page)), WASM (doc.getAnnotations(page)) und Go (doc.Annotations(page)) verfügbar. Die öffentliche API von C# stellt noch keinen GetAnnotations-Wrapper bereit — die native FFI-Methode (PdfDocumentGetPageAnnotations) existiert, ist aber nicht umhüllt. Für die Annotationsextraktion aus C# verwenden Sie entweder das Rust CLI (pdf-oxide annotations doc.pdf) oder rufen Sie PdfPageGetAnnotationsCount / pdf_get_annotations_by_type direkt via P/Invoke auf, bis ein öffentlicher Wrapper verfügbar ist.

Schnellbeispiel

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

API-Referenz

`get_annotations(page_index) -> Vec<Annotation>`

Extrahiert rohe Annotationen von einer bestimmten Seite. Gibt alle auf der Seite vorhandenen Annotationstypen zurück.

Parameter	Typ	Beschreibung
`page_index`	`usize`	Nullbasierter Seitenindex

Rückgabe: Ein Vektor von Annotation-Objekten.

Annotationsfelder

Feld	Typ	Beschreibung
`annotation_type`	`String`	Immer `"Annot"`
`subtype`	`Option<String>`	Roher Subtyp-String (z. B. `"Text"`, `"Highlight"`)
`subtype_enum`	`AnnotationSubtype`	Geparste Subtyp-Enumeration
`contents`	`Option<String>`	Textinhalt der Annotation
`rect`	`Option<[f64; 4]>`	Begrenzungsrechteck [x1, y1, x2, y2]
`author`	`Option<String>`	Autor/Ersteller (`/T`-Eintrag)
`creation_date`	`Option<String>`	Erstellungsdatum
`modification_date`	`Option<String>`	Datum der letzten Änderung
`subject`	`Option<String>`	Betreff der Annotation
`destination`	`Option<LinkDestination>`	Linkziel (für Link-Annotationen)
`action`	`Option<LinkAction>`	Linkaktion (für Link-Annotationen)
`color`	`Option<Vec<f64>>`	Farbkomponenten der Annotation
`flags`	`Option<AnnotationFlags>`	Annotationsflags (invisible, hidden, print usw.)

AnnotationSubtype-Varianten

Variante	Beschreibung
`Text`	Haftnotiz-Annotation
`Link`	Hyperlink-Annotation
`FreeText`	Textfeld-Annotation
`Line`	Linien-Annotation
`Square`	Rechteck-Annotation
`Circle`	Ellipsen-Annotation
`Polygon`	Polygon-Annotation
`PolyLine`	Polylinie-Annotation
`Highlight`	Texthervorhebungs-Markup
`Underline`	Unterstreichungs-Markup
`Squiggly`	Wellenlinien-Unterstreichung
`StrikeOut`	Durchstreichungs-Markup
`Stamp`	Stempel-Annotation
`Ink`	Freihand-Annotation
`Popup`	Pop-up-Notiz, verknüpft mit einer anderen Annotation
`FileAttachment`	Eingebettete-Datei-Annotation
`Sound`	Ton-Annotation
`Movie`	Film-Annotation
`Screen`	Bildschirm-Annotation
`Widget`	Formularfeld-Widget
`PrinterMark`	Druckermarken-Annotation
`TrapNet`	Überfüllungsnetz-Annotation
`Watermark`	Wasserzeichen-Annotation
`ThreeDimensional`	3D-Annotation
`Redact`	Schwärzungs-Annotation
`Caret`	Caret-Annotation (Einfügepunkt)
`RichMedia`	Rich-Media-Annotation
`Unknown`	Unbekannter Annotationstyp

`get_outline() -> Option<Vec<OutlineItem>>`

Ruft die Dokumentgliederung (Lesezeichen) ab, sofern vorhanden. Gibt einen hierarchischen Baum von Gliederungselementen zurück, der zur Dokumentnavigation verwendet werden kann.

Rückgabe:

Some(Vec<OutlineItem>) – Lesezeichen gefunden und geparst
None – Keine Lesezeichen im Dokument

OutlineItem-Felder

Feld	Typ	Beschreibung
`title`	`String`	Titeltext des Lesezeichens
`dest`	`Option<Destination>`	Navigationsziel
`children`	`Vec<OutlineItem>`	Verschachtelte untergeordnete Lesezeichen

Destination-Varianten

Variante	Beschreibung
`PageIndex(usize)`	Direkte Seitenreferenz (nullbasierter Index)
`Named(String)`	Bezeichner für ein benanntes Ziel

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

PdfPage-Annotations-API (DOM)

Das PdfPage-Objekt des DocumentEditor bietet ein übergeordnetes AnnotationWrapper-Interface, das sowohl das Lesen vorhandener Annotationen als auch das Hinzufügen neuer unterstützt.

`page.annotations() -> &[AnnotationWrapper]`

Gibt alle Annotationen auf der Seite als umhüllte Objekte zurück.

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

Findet Annotationen eines bestimmten Typs.

`page.add_annotation(annotation)`

Fügt eine neue Annotation zur Seite hinzu.

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

Entfernt eine Annotation anhand des Index.

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

Findet Annotationen, deren Begrenzungsrahmen einen bestimmten Bereich schneiden.

AnnotationWrapper-Methoden

Methode	Rückgabe	Beschreibung
`id()`	`AnnotationId`	Eindeutige Sitzungs-ID
`subtype()`	`AnnotationSubtype`	Annotationstyp
`rect()`	`Rect`	Begrenzungsrechteck
`contents()`	`Option<&str>`	Textinhalt
`color()`	`Option<(f32, f32, f32)>`	RGB-Farbe (0.0–1.0)
`is_modified()`	`bool`	Ob die Annotation geändert wurde

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — Annotationen einer Seite serialisieren

annotations_to_json serialisiert eine vollständige Annotationsliste in einem einzigen FFI-Aufruf zu einem JSON-Array. Das Go-Binding verwendet dies intern, um []Annotation zu materialisieren; Swift exponiert es direkt als annotationsToJson. Die C-ABI-Signatur lautet:

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

Der zurückgegebene UTF-8-String gehört dem Aufrufer (freigeben mit free_string). Sein Schema entspricht der Go-Struct Annotation — Felder: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

Binding-Abdeckung. annotations_to_json ist direkt in Swift (doc.annotationsToJson(page)), C++ (doc.annotations_to_json(page)), Dart (doc.annotationsToJson(page)), R (pdf_annotations_to_json(doc, page)), Julia (annotations_to_json(doc, page)), Zig (doc.annotationList(page).toJson(...)), Objective-C ([doc annotationsJson:page error:]), Elixir (PdfOxide.annotations_to_json(doc, page)) und im C ABI (pdf_oxide_annotations_to_json) verfügbar. Das Go-Binding ruft es intern auf, um doc.Annotations(page) in typisierte Structs zu dekodieren. Im WASM-Ziel wird es aus der Kompilierung ausgeschlossen.

`annotation_extras` — erweiterte Annotationsattribute

annotation_extras liest die erweiterten Attribute einer einzelnen Annotation, die nicht Teil der Kernansicht Annotation sind: Farbe, Erstellungs-/Änderungszeitstempel, die vier Sichtbarkeitsflags (hidden, marked-deleted, printable, read-only), der URI einer Link-Annotation, der Ikonenname einer Text-Annotation sowie die Viereckspunkte einer Hervorhebungs-/Markup-Annotation.

In Swift werden diese zusammen als AnnotationExtras-Struct über annotationExtras(page, index:) zurückgegeben. In Go sind dieselben Felder direkt in die Annotation-Struct integriert (Color, CreationDate, ModificationDate, LinkURI, TextIconName, IsHidden, IsPrintable, IsReadOnly, IsMarkedDeleted). Intern rufen beide die C-ABI-Accessor-Familie pdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_* auf.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

AnnotationExtras-Felder (Swift)

Feld	Typ	Beschreibung
`color`	`UInt32`	Gepackte Annotationsfarbe
`creationDate`	`Int64`	Erstellungszeitstempel
`modificationDate`	`Int64`	Änderungszeitstempel
`hidden`	`Bool`	hidden-Flag
`markedDeleted`	`Bool`	marked-deleted-Flag
`printable`	`Bool`	print-Flag
`readOnly`	`Bool`	read-only-Flag
`uri`	`String`	URI der Link-Annotation (leer, falls nicht vorhanden)
`iconName`	`String`	Ikonenname der Text-Annotation (leer, falls nicht vorhanden)
`quadPoints`	`[QuadPoint]`	Vierecke für Hervorhebungen/Markup (je 4 Ecken)

Binding-Abdeckung. annotation_extras ist als dedizierte AnnotationExtras-Struct in Swift (doc.annotationExtras(page, index:)) und über die C ABI-Accessor-Familie pdf_oxide_annotation_get_* verfügbar. Dieselbe indexbasierte Accessor-Familie ist auch in C++ (doc.annotation_get_*), R (pdf_annotation_get_*), Julia (annotation_*), Zig (AnnotationList.getColor/isHidden/...) und Elixir (PdfOxide.annotation_color/...) umhüllt. In Go, Dart (doc.pageAnnotationDetails(page)) und Objective-C (inline in POXAnnotation) sind dieselben Attribute direkt in jede Annotation eingebettet. Die Accessoren werden im WASM-Ziel aus der Kompilierung ausgeschlossen.

Erweiterte Beispiele

Inhaltsverzeichnis aus Lesezeichen aufbauen

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

Alle Kommentare (Text-Annotationen) extrahieren

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

Alle Hyperlinks extrahieren

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

Häufig gestellte Fragen

Was ist der Unterschied zwischen get_annotations und annotation_extras? get_annotations gibt die Kernansicht der Annotation zurück (Subtyp, Inhalt, Rechteck, Autor, Datum, Farbe, Flags). annotation_extras ergänzt die erweiterten Attribute — gepackte Farbe, Zeitstempel, die vier Sichtbarkeitsflags, Link-URI, Ikonenname der Text-Annotation und Viereckspunkte der Hervorhebung. In Go sind diese in einer einzigen Annotation zusammengeführt; in Swift sind sie eine separate AnnotationExtras-Struct.

Welches JSON-Schema erzeugt annotations_to_json? Ein JSON-Array, das der Go-Struct Annotation entspricht: Felder type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Warum sind Link-URIs und Ikonennamen manchmal leer? Diese Felder gelten nur für bestimmte Subtypen — uri für Link-Annotationen und iconName für Text-Annotationen (Haftnotizen). Für andere Subtypen werden leere Strings zurückgegeben.

Ist die Annotationsextraktion schnell? Ja. Der Extraktionskern von PDF Oxide läuft auf dem Benchmark-Korpus mit einem Mittelwert von etwa 0,8 ms und einem p99-Wert von 9 ms bei einer Bestehensrate von 100 %.

Annotationsextraktion

Schnellbeispiel

API-Referenz

get_annotations(page_index) -> Vec<Annotation>

Annotationsfelder

AnnotationSubtype-Varianten

get_outline() -> Option<Vec<OutlineItem>>

OutlineItem-Felder

Destination-Varianten

PdfPage-Annotations-API (DOM)

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

AnnotationWrapper-Methoden

annotations_to_json — Annotationen einer Seite serialisieren

annotation_extras — erweiterte Annotationsattribute

AnnotationExtras-Felder (Swift)

Erweiterte Beispiele

Inhaltsverzeichnis aus Lesezeichen aufbauen

Alle Kommentare (Text-Annotationen) extrahieren

Alle Hyperlinks extrahieren

Häufig gestellte Fragen

Verwandte Seiten

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — Annotationen einer Seite serialisieren

`annotation_extras` — erweiterte Annotationsattribute