What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

注释提取

PDF Oxide 提供对 PDF 规范（ISO 32000-1:2008，第 12.5 节）中定义的所有注释类型的访问，包括文本便签、超链接、高亮、印章、墨迹注释等。文档大纲（书签）同样可以访问，用于构建导航结构。

使用 PdfDocument 上的 get_annotations() 获取原始注释数据，或使用 PdfPage DOM API 获得同时支持读写的统一 AnnotationWrapper 接口。

绑定支持情况。 注释提取在 Python（doc.get_annotations(page)）、Rust（doc.get_annotations(page)）、WASM（doc.getAnnotations(page)）和 Go（doc.Annotations(page)）中可用。C# 的公共 API 尚未公开 GetAnnotations 包装器——底层 FFI 方法（PdfDocumentGetPageAnnotations）已存在，但未被封装。在 C# 中提取注释，可使用 Rust CLI（pdf-oxide annotations doc.pdf），或在公共包装器发布前通过 P/Invoke 直接调用 PdfPageGetAnnotationsCount / pdf_get_annotations_by_type。

快速示例

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

API 参考

`get_annotations(page_index) -> Vec<Annotation>`

从指定页面提取原始注释，返回该页面上所有类型的注释。

参数	类型	说明
`page_index`	`usize`	从零开始的页面索引

返回值： Annotation 对象的向量。

注释字段

字段	类型	说明
`annotation_type`	`String`	始终为 `"Annot"`
`subtype`	`Option<String>`	原始子类型字符串（如 `"Text"`、`"Highlight"`）
`subtype_enum`	`AnnotationSubtype`	解析后的子类型枚举
`contents`	`Option<String>`	注释的文本内容
`rect`	`Option<[f64; 4]>`	边界矩形 [x1, y1, x2, y2]
`author`	`Option<String>`	作者/创建者（`/T` 条目）
`creation_date`	`Option<String>`	创建日期
`modification_date`	`Option<String>`	最后修改日期
`subject`	`Option<String>`	注释主题
`destination`	`Option<LinkDestination>`	链接目标（用于链接注释）
`action`	`Option<LinkAction>`	链接动作（用于链接注释）
`color`	`Option<Vec<f64>>`	注释颜色分量
`flags`	`Option<AnnotationFlags>`	注释标志（invisible、hidden、print 等）

AnnotationSubtype 枚举值

枚举值	说明
`Text`	便签注释
`Link`	超链接注释
`FreeText`	文本框注释
`Line`	直线注释
`Square`	矩形注释
`Circle`	椭圆注释
`Polygon`	多边形注释
`PolyLine`	折线注释
`Highlight`	文本高亮标记
`Underline`	文本下划线标记
`Squiggly`	波浪下划线标记
`StrikeOut`	删除线标记
`Stamp`	印章注释
`Ink`	手绘注释
`Popup`	与其他注释关联的弹出便签
`FileAttachment`	嵌入文件注释
`Sound`	声音注释
`Movie`	视频注释
`Screen`	屏幕注释
`Widget`	表单字段控件
`PrinterMark`	印刷标记注释
`TrapNet`	陷印网络注释
`Watermark`	水印注释
`ThreeDimensional`	3D 注释
`Redact`	编辑删除注释
`Caret`	插入符注释（插入点）
`RichMedia`	富媒体注释
`Unknown`	无法识别的注释类型

`get_outline() -> Option<Vec<OutlineItem>>`

获取文档大纲（书签）（如果存在）。返回可用于文档导航的大纲项层次树。

返回值：

Some(Vec<OutlineItem>) – 找到并解析了书签
None – 文档中没有书签

OutlineItem 字段

字段	类型	说明
`title`	`String`	书签标题文字
`dest`	`Option<Destination>`	导航目标
`children`	`Vec<OutlineItem>`	嵌套子书签

Destination 枚举值

枚举值	说明
`PageIndex(usize)`	直接页面引用（从零开始的索引）
`Named(String)`	命名目标标识符

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

PdfPage 注释 API（DOM）

DocumentEditor 中的 PdfPage 对象提供更高层次的 AnnotationWrapper 接口，支持读取现有注释和添加新注释。

`page.annotations() -> &[AnnotationWrapper]`

以包装对象的形式获取页面上的所有注释。

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

查找指定类型的注释。

`page.add_annotation(annotation)`

向页面添加新注释。

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

按索引删除注释。

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

查找边界框与指定区域相交的注释。

AnnotationWrapper 方法

方法	返回类型	说明
`id()`	`AnnotationId`	会话唯一 ID
`subtype()`	`AnnotationSubtype`	注释类型
`rect()`	`Rect`	边界矩形
`contents()`	`Option<&str>`	文本内容
`color()`	`Option<(f32, f32, f32)>`	RGB 颜色（0.0–1.0）
`is_modified()`	`bool`	注释是否已被修改

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — 序列化页面注释

annotations_to_json 通过单次 FFI 调用将整个注释列表序列化为 JSON 数组。Go 绑定在内部使用它将注释转换为 []Annotation；Swift 将其直接公开为 annotationsToJson。C ABI 签名如下：

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

返回的 UTF-8 字符串由调用方持有（使用 free_string 释放）。其模式与 Go 的 Annotation 结构体一致——字段包括：type、subtype、content、x、y、width、height、author、borderWidth、color、creationDate、modificationDate、linkURI、textIconName、isHidden、isPrintable、isReadOnly、isMarkedDeleted。

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

绑定支持情况。 annotations_to_json 在 Swift（doc.annotationsToJson(page)）、C++（doc.annotations_to_json(page)）、Dart（doc.annotationsToJson(page)）、R（pdf_annotations_to_json(doc, page)）、Julia（annotations_to_json(doc, page)）、Zig（doc.annotationList(page).toJson(...)）、Objective-C（[doc annotationsJson:page error:]）、Elixir（PdfOxide.annotations_to_json(doc, page)）以及 C ABI（pdf_oxide_annotations_to_json）中直接公开。Go 绑定在内部调用它，将 doc.Annotations(page) 解码为类型化结构体。该函数在 WASM 目标中不会被编译。

`annotation_extras` — 扩展注释属性

annotation_extras 读取单个注释中不属于核心 Annotation 视图的扩展属性：颜色、创建/修改时间戳、四个可见性标志（hidden、marked-deleted、printable、read-only）、链接注释的 URI、文本注释的图标名称，以及高亮/标记注释的四边形点。

在 Swift 中，这些属性通过 annotationExtras(page, index:) 以 AnnotationExtras 结构体的形式返回。在 Go 中，相同字段直接合并到 Annotation 结构体中（Color、CreationDate、ModificationDate、LinkURI、TextIconName、IsHidden、IsPrintable、IsReadOnly、IsMarkedDeleted）。底层两者都调用 pdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_* C ABI 访问器族。

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

AnnotationExtras 字段（Swift）

字段	类型	说明
`color`	`UInt32`	打包的注释颜色
`creationDate`	`Int64`	创建时间戳
`modificationDate`	`Int64`	修改时间戳
`hidden`	`Bool`	hidden 标志
`markedDeleted`	`Bool`	marked-deleted 标志
`printable`	`Bool`	print 标志
`readOnly`	`Bool`	read-only 标志
`uri`	`String`	链接注释 URI（无则为空字符串）
`iconName`	`String`	文本注释图标名称（无则为空字符串）
`quadPoints`	`[QuadPoint]`	高亮/标记注释的四边形（每个包含 4 个角点）

绑定支持情况。 annotation_extras 在 Swift（doc.annotationExtras(page, index:)）中以专用的 AnnotationExtras 结构体公开，并通过 pdf_oxide_annotation_get_* C ABI 访问器族公开。相同的按索引访问器族还封装在 C++（doc.annotation_get_*）、R（pdf_annotation_get_*）、Julia（annotation_*）、Zig（AnnotationList.getColor/isHidden/...）和 Elixir（PdfOxide.annotation_color/...）中。在 Go、Dart（doc.pageAnnotationDetails(page)）和 Objective-C（内联在 POXAnnotation 上）中，相同属性直接内联在每个注释对象上。访问器在 WASM 目标中不会被编译。

进阶示例

从书签构建目录

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

提取所有批注（Text 注释）

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

提取所有超链接

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

常见问题

get_annotations 和 annotation_extras 有什么区别？ get_annotations 返回核心注释视图（子类型、内容、矩形、作者、日期、颜色、标志）。annotation_extras 在此基础上增加了扩展属性——打包颜色、时间戳、四个可见性标志、链接 URI、文本注释图标名称以及高亮四边形点。在 Go 中这些属性合并到同一个 Annotation 中；在 Swift 中则是一个独立的 AnnotationExtras 结构体。

annotations_to_json 生成的 JSON 格式是什么？ 一个与 Go 的 Annotation 结构体对应的 JSON 数组，字段包括：type、subtype、content、x、y、width、height、author、borderWidth、color、creationDate、modificationDate、linkURI、textIconName、isHidden、isPrintable、isReadOnly、isMarkedDeleted。

为什么链接 URI 和图标名称有时为空？ 这些字段只适用于特定子类型——uri 对应 Link 注释，iconName 对应 Text（便签）注释。对于其他子类型，它们返回空字符串。

注释提取的性能如何？ 性能非常出色。PDF Oxide 的提取核心在基准测试语料库上的平均耗时约为 0.8 ms，p99 为 9 ms，通过率为 100%。

注释提取

快速示例

API 参考

get_annotations(page_index) -> Vec<Annotation>

注释字段

AnnotationSubtype 枚举值

get_outline() -> Option<Vec<OutlineItem>>

OutlineItem 字段

Destination 枚举值

PdfPage 注释 API（DOM）

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

AnnotationWrapper 方法

annotations_to_json — 序列化页面注释

annotation_extras — 扩展注释属性

AnnotationExtras 字段（Swift）

进阶示例

从书签构建目录

提取所有批注（Text 注释）

提取所有超链接

常见问题

相关页面

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — 序列化页面注释

`annotation_extras` — 扩展注释属性