What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Extração de Anotações

O PDF Oxide fornece acesso a todos os tipos de anotação definidos na especificação PDF (ISO 32000-1:2008, Seção 12.5), incluindo notas de texto, hiperlinks, destaques, carimbos, anotações de tinta e muito mais. O sumário do documento (favoritos) também está acessível para a construção de estruturas de navegação.

Use get_annotations() em PdfDocument para obter dados brutos de anotações, ou a API DOM de PdfPage para uma interface unificada AnnotationWrapper que suporta leitura e escrita.

Cobertura de bindings. A extração de anotações está disponível em Python (doc.get_annotations(page)), Rust (doc.get_annotations(page)), WASM (doc.getAnnotations(page)) e Go (doc.Annotations(page)). A API pública do C# ainda não expõe um wrapper GetAnnotations — o método FFI nativo (PdfDocumentGetPageAnnotations) existe, mas não está encapsulado. Para extração de anotações em C#, use o Rust CLI (pdf-oxide annotations doc.pdf) ou chame diretamente PdfPageGetAnnotationsCount / pdf_get_annotations_by_type via P/Invoke até que um wrapper público seja disponibilizado.

Exemplo Rápido

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

Referência da API

`get_annotations(page_index) -> Vec<Annotation>`

Extrai anotações brutas de uma página específica. Retorna todos os tipos de anotação presentes na página.

Parâmetro	Tipo	Descrição
`page_index`	`usize`	Índice de página base zero

Retorna: Um vetor de objetos Annotation.

Campos da Anotação

Campo	Tipo	Descrição
`annotation_type`	`String`	Sempre `"Annot"`
`subtype`	`Option<String>`	String de subtipo bruto (ex.: `"Text"`, `"Highlight"`)
`subtype_enum`	`AnnotationSubtype`	Enum de subtipo analisado
`contents`	`Option<String>`	Conteúdo de texto da anotação
`rect`	`Option<[f64; 4]>`	Retângulo delimitador [x1, y1, x2, y2]
`author`	`Option<String>`	Autor/criador (entrada `/T`)
`creation_date`	`Option<String>`	Data de criação
`modification_date`	`Option<String>`	Data da última modificação
`subject`	`Option<String>`	Assunto da anotação
`destination`	`Option<LinkDestination>`	Destino do link (para anotações Link)
`action`	`Option<LinkAction>`	Ação do link (para anotações Link)
`color`	`Option<Vec<f64>>`	Componentes de cor da anotação
`flags`	`Option<AnnotationFlags>`	Sinalizadores da anotação (invisible, hidden, print, etc.)

Variantes de AnnotationSubtype

Variante	Descrição
`Text`	Anotação de nota adesiva
`Link`	Anotação de hiperlink
`FreeText`	Anotação de caixa de texto
`Line`	Anotação de linha
`Square`	Anotação de retângulo
`Circle`	Anotação de elipse
`Polygon`	Anotação de polígono
`PolyLine`	Anotação de polilinha
`Highlight`	Marcação de destaque de texto
`Underline`	Marcação de sublinhado de texto
`Squiggly`	Marcação de sublinhado ondulado
`StrikeOut`	Marcação de tachado
`Stamp`	Anotação de carimbo
`Ink`	Anotação de desenho à mão livre
`Popup`	Nota pop-up associada a outra anotação
`FileAttachment`	Anotação de arquivo incorporado
`Sound`	Anotação de som
`Movie`	Anotação de vídeo
`Screen`	Anotação de tela
`Widget`	Widget de campo de formulário
`PrinterMark`	Anotação de marca de impressora
`TrapNet`	Anotação de rede de trap
`Watermark`	Anotação de marca d’água
`ThreeDimensional`	Anotação 3D
`Redact`	Anotação de redação
`Caret`	Anotação de cursor (ponto de inserção)
`RichMedia`	Anotação de mídia avançada
`Unknown`	Tipo de anotação não reconhecido

`get_outline() -> Option<Vec<OutlineItem>>`

Obtém o sumário do documento (favoritos), se presente. Retorna uma árvore hierárquica de itens de sumário que pode ser usada para navegação no documento.

Retorna:

Some(Vec<OutlineItem>) – Favoritos encontrados e analisados
None – Sem favoritos no documento

Campos de OutlineItem

Campo	Tipo	Descrição
`title`	`String`	Texto do título do favorito
`dest`	`Option<Destination>`	Destino de navegação
`children`	`Vec<OutlineItem>`	Favoritos filhos aninhados

Variantes de Destination

Variante	Descrição
`PageIndex(usize)`	Referência direta de página (índice base zero)
`Named(String)`	Identificador de destino nomeado

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

API de Anotação do PdfPage (DOM)

O objeto PdfPage do DocumentEditor fornece uma interface AnnotationWrapper de nível mais alto que suporta tanto a leitura de anotações existentes quanto a adição de novas.

`page.annotations() -> &[AnnotationWrapper]`

Obtém todas as anotações da página como objetos encapsulados.

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

Encontra anotações de um tipo específico.

`page.add_annotation(annotation)`

Adiciona uma nova anotação à página.

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

Remove uma anotação pelo índice.

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

Encontra anotações cujas caixas delimitadoras se intersectam com uma região especificada.

Métodos de AnnotationWrapper

Método	Retorna	Descrição
`id()`	`AnnotationId`	ID único de sessão
`subtype()`	`AnnotationSubtype`	Tipo de anotação
`rect()`	`Rect`	Retângulo delimitador
`contents()`	`Option<&str>`	Conteúdo de texto
`color()`	`Option<(f32, f32, f32)>`	Cor RGB (0.0–1.0)
`is_modified()`	`bool`	Se a anotação foi alterada

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — serializar anotações de uma página

annotations_to_json serializa uma lista inteira de anotações em um array JSON em uma única chamada FFI. O binding Go usa isso internamente para materializar []Annotation; o Swift expõe diretamente como annotationsToJson. A assinatura C ABI é:

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

A string UTF-8 retornada pertence ao chamador (libere com free_string). Seu esquema corresponde à struct Annotation do Go — type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

Cobertura de bindings. annotations_to_json é exposto diretamente em Swift (doc.annotationsToJson(page)), C++ (doc.annotations_to_json(page)), Dart (doc.annotationsToJson(page)), R (pdf_annotations_to_json(doc, page)), Julia (annotations_to_json(doc, page)), Zig (doc.annotationList(page).toJson(...)), Objective-C ([doc annotationsJson:page error:]), Elixir (PdfOxide.annotations_to_json(doc, page)) e o C ABI (pdf_oxide_annotations_to_json); o binding Go chama internamente para decodificar doc.Annotations(page) em structs tipadas. É excluído da compilação no alvo WASM.

`annotation_extras` — atributos estendidos de anotação

annotation_extras lê os atributos estendidos de uma única anotação que não fazem parte da visão principal de Annotation: cor, timestamps de criação/modificação, os quatro flags de visibilidade (hidden, marked-deleted, printable, read-only), o URI de uma anotação de link, o nome do ícone de uma anotação de texto e os pontos de quadrilátero de uma anotação de destaque/marcação.

No Swift, estes são retornados juntos como uma struct AnnotationExtras via annotationExtras(page, index:). No Go, os mesmos campos são incorporados diretamente na struct Annotation (Color, CreationDate, ModificationDate, LinkURI, TextIconName, IsHidden, IsPrintable, IsReadOnly, IsMarkedDeleted). Internamente, ambos chamam a família de acessores C ABI pdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_*.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

Campos de AnnotationExtras (Swift)

Campo	Tipo	Descrição
`color`	`UInt32`	Cor compactada da anotação
`creationDate`	`Int64`	Timestamp de criação
`modificationDate`	`Int64`	Timestamp de modificação
`hidden`	`Bool`	Flag hidden
`markedDeleted`	`Bool`	Flag marked-deleted
`printable`	`Bool`	Flag print
`readOnly`	`Bool`	Flag read-only
`uri`	`String`	URI da anotação de link (vazio se não houver)
`iconName`	`String`	Nome do ícone da anotação de texto (vazio se não houver)
`quadPoints`	`[QuadPoint]`	Quadriláteros de destaque/marcação (4 cantos cada)

Cobertura de bindings. annotation_extras é exposto como uma struct AnnotationExtras dedicada no Swift (doc.annotationExtras(page, index:)) e via família de acessores C ABI pdf_oxide_annotation_get_*. A mesma família de acessores por índice é encapsulada em C++ (doc.annotation_get_*), R (pdf_annotation_get_*), Julia (annotation_*), Zig (AnnotationList.getColor/isHidden/...) e Elixir (PdfOxide.annotation_color/...). No Go, Dart (doc.pageAnnotationDetails(page)) e Objective-C (inline em POXAnnotation), os mesmos atributos são materializados inline em cada anotação. Os acessores são excluídos da compilação no alvo WASM.

Exemplos Avançados

Construir um sumário a partir dos favoritos

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

Extrair todos os comentários (anotações Text)

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

Extrair todos os hiperlinks

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

Perguntas Frequentes

Qual é a diferença entre get_annotations e annotation_extras? get_annotations retorna a visão principal da anotação (subtipo, conteúdo, retângulo, autor, datas, cor, flags). annotation_extras adiciona os atributos estendidos — cor compactada, timestamps, os quatro flags de visibilidade, URI do link, nome do ícone da anotação de texto e pontos de quadrilátero do destaque. No Go estes são mesclados em um único Annotation; no Swift são uma struct AnnotationExtras separada.

Qual esquema JSON annotations_to_json produz? Um array JSON correspondente à struct Annotation do Go: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Por que os URIs de link e nomes de ícone às vezes estão vazios? Esses campos se aplicam apenas a subtipos específicos — uri para anotações Link e iconName para anotações Text (nota adesiva). Para outros subtipos eles retornam strings vazias.

A extração de anotações é rápida? Sim. O núcleo de extração do PDF Oxide roda a aproximadamente 0,8 ms de média e 9 ms no p99 com 100% de taxa de aprovação no corpus de benchmark.

Páginas Relacionadas

Extração de Dados de Formulário – Extraia campos de formulário (anotações Widget)
Extração de Texto – Extraia conteúdo de texto das páginas
Metadados e XMP – Leia propriedades do documento e favoritos
Extração de Imagens – Imagens incorporadas e acessor de elementos de página

Extração de Anotações

Exemplo Rápido

Referência da API

get_annotations(page_index) -> Vec<Annotation>

Campos da Anotação

Variantes de AnnotationSubtype

get_outline() -> Option<Vec<OutlineItem>>

Campos de OutlineItem

Variantes de Destination

API de Anotação do PdfPage (DOM)

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

Métodos de AnnotationWrapper

annotations_to_json — serializar anotações de uma página

annotation_extras — atributos estendidos de anotação

Campos de AnnotationExtras (Swift)

Exemplos Avançados

Construir um sumário a partir dos favoritos

Extrair todos os comentários (anotações Text)

Extrair todos os hiperlinks

Perguntas Frequentes

Páginas Relacionadas

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — serializar anotações de uma página

`annotation_extras` — atributos estendidos de anotação