What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

アノテーション抽出

PDF Oxideは、PDFの仕様（ISO 32000-1:2008、セクション12.5）で定義されているすべてのアノテーション型へのアクセスを提供します。テキストノート、ハイパーリンク、ハイライト、スタンプ、インクアノテーションなどが含まれます。ナビゲーション構造の構築に使用できるドキュメントのアウトライン（ブックマーク）にもアクセスできます。

生のアノテーションデータにはPdfDocumentのget_annotations()を使用するか、読み取りと書き込みの両方をサポートする統合AnnotationWrapperインターフェースを持つPdfPage DOM APIを使用してください。

バインディングの対応状況。 アノテーション抽出は Python（doc.get_annotations(page)）、Rust（doc.get_annotations(page)）、WASM（doc.getAnnotations(page)）、Go（doc.Annotations(page)）で利用できます。C# のパブリックAPIはまだGetAnnotationsラッパーを公開していません。ネイティブのFFIメソッド（PdfDocumentGetPageAnnotations）は存在しますが、ラップされていません。C#からアノテーションを抽出するには、Rust CLI（pdf-oxide annotations doc.pdf）を使用するか、パブリックラッパーが提供されるまでP/Invokeで直接PdfPageGetAnnotationsCount / pdf_get_annotations_by_typeを呼び出してください。

クイックサンプル

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

APIリファレンス

`get_annotations(page_index) -> Vec<Annotation>`

特定のページから生のアノテーションを抽出します。ページ上に存在するすべてのアノテーション型を返します。

パラメータ	型	説明
`page_index`	`usize`	0始まりのページインデックス

戻り値: Annotationオブジェクトのベクター。

アノテーションのフィールド

フィールド	型	説明
`annotation_type`	`String`	常に`"Annot"`
`subtype`	`Option<String>`	生のサブタイプ文字列（例：`"Text"`、`"Highlight"`）
`subtype_enum`	`AnnotationSubtype`	パース済みサブタイプ列挙型
`contents`	`Option<String>`	アノテーションのテキスト内容
`rect`	`Option<[f64; 4]>`	バウンディングレクタングル [x1, y1, x2, y2]
`author`	`Option<String>`	作成者（`/T`エントリ）
`creation_date`	`Option<String>`	作成日
`modification_date`	`Option<String>`	最終更新日
`subject`	`Option<String>`	アノテーションの件名
`destination`	`Option<LinkDestination>`	リンク先（Linkアノテーション用）
`action`	`Option<LinkAction>`	リンクアクション（Linkアノテーション用）
`color`	`Option<Vec<f64>>`	アノテーションのカラーコンポーネント
`flags`	`Option<AnnotationFlags>`	アノテーションフラグ（invisible、hidden、printなど）

AnnotationSubtypeのバリアント

バリアント	説明
`Text`	付箋アノテーション
`Link`	ハイパーリンクアノテーション
`FreeText`	テキストボックスアノテーション
`Line`	直線アノテーション
`Square`	矩形アノテーション
`Circle`	楕円アノテーション
`Polygon`	ポリゴンアノテーション
`PolyLine`	ポリラインアノテーション
`Highlight`	テキストハイライトマークアップ
`Underline`	テキスト下線マークアップ
`Squiggly`	波線下線マークアップ
`StrikeOut`	取り消し線マークアップ
`Stamp`	ゴム印アノテーション
`Ink`	フリーハンド描画アノテーション
`Popup`	別のアノテーションに関連するポップアップノート
`FileAttachment`	埋め込みファイルアノテーション
`Sound`	サウンドアノテーション
`Movie`	ムービーアノテーション
`Screen`	スクリーンアノテーション
`Widget`	フォームフィールドウィジェット
`PrinterMark`	プリンターマークアノテーション
`TrapNet`	トラップネットワークアノテーション
`Watermark`	ウォーターマークアノテーション
`ThreeDimensional`	3Dアノテーション
`Redact`	墨消しアノテーション
`Caret`	キャレットアノテーション（挿入ポイント）
`RichMedia`	リッチメディアアノテーション
`Unknown`	未知のアノテーション型

`get_outline() -> Option<Vec<OutlineItem>>`

ドキュメントのアウトライン（ブックマーク）が存在する場合に取得します。ドキュメントナビゲーションに使用できるアウトラインアイテムの階層ツリーを返します。

戻り値:

Some(Vec<OutlineItem>) – ブックマークが見つかりパースされた場合
None – ドキュメントにブックマークが存在しない場合

OutlineItemのフィールド

フィールド	型	説明
`title`	`String`	ブックマークのタイトルテキスト
`dest`	`Option<Destination>`	ナビゲーション先
`children`	`Vec<OutlineItem>`	ネストされた子ブックマーク

Destinationのバリアント

バリアント	説明
`PageIndex(usize)`	直接ページ参照（0始まりのインデックス）
`Named(String)`	名前付き宛先の識別子

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

PdfPageのアノテーションAPI（DOM）

DocumentEditorのPdfPageオブジェクトは、既存のアノテーションの読み取りと新規アノテーションの追加の両方をサポートするより高レベルなAnnotationWrapperインターフェースを提供します。

`page.annotations() -> &[AnnotationWrapper]`

ページ上のすべてのアノテーションをラップされたオブジェクトとして取得します。

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

特定の型のアノテーションを検索します。

`page.add_annotation(annotation)`

ページに新しいアノテーションを追加します。

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

インデックスを指定してアノテーションを削除します。

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

バウンディングボックスが指定した領域と交差するアノテーションを検索します。

AnnotationWrapperのメソッド

メソッド	戻り値	説明
`id()`	`AnnotationId`	セッション固有の一意ID
`subtype()`	`AnnotationSubtype`	アノテーション型
`rect()`	`Rect`	バウンディングレクタングル
`contents()`	`Option<&str>`	テキスト内容
`color()`	`Option<(f32, f32, f32)>`	RGBカラー（0.0〜1.0）
`is_modified()`	`bool`	アノテーションが変更されたかどうか

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — ページのアノテーションをシリアライズ

annotations_to_jsonは、アノテーションリスト全体を単一のFFI呼び出しでJSON配列にシリアライズします。GoバインディングはこれをDOMで[]Annotationに変換するために内部的に使用しており、SwiftではannotationsToJsonとして直接公開されています。C ABIのシグネチャは次の通りです：

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

返されるUTF-8文字列は呼び出し側が所有します（free_stringで解放）。スキーマはGoのAnnotation構造体に対応しています。フィールド：type、subtype、content、x、y、width、height、author、borderWidth、color、creationDate、modificationDate、linkURI、textIconName、isHidden、isPrintable、isReadOnly、isMarkedDeleted。

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

バインディングの対応状況。 annotations_to_jsonは Swift（doc.annotationsToJson(page)）、C++（doc.annotations_to_json(page)）、Dart（doc.annotationsToJson(page)）、R（pdf_annotations_to_json(doc, page)）、Julia（annotations_to_json(doc, page)）、Zig（doc.annotationList(page).toJson(...)）、Objective-C（[doc annotationsJson:page error:]）、Elixir（PdfOxide.annotations_to_json(doc, page)）、および C ABI（pdf_oxide_annotations_to_json）で直接公開されています。Go バインディングはdoc.Annotations(page)を型付き構造体にデコードするために内部的に呼び出します。WASMターゲットではコンパイル対象外です。

`annotation_extras` — 拡張アノテーション属性

annotation_extrasは、コアAnnotationビューに含まれない単一アノテーションの拡張属性を読み取ります。読み取れる属性：カラー、作成・更新タイムスタンプ、4つの表示フラグ（hidden、marked-deleted、printable、read-only）、リンクアノテーションのURI、テキストアノテーションのアイコン名、ハイライト・マークアップアノテーションのクアッドポイント。

Swift では、これらはannotationExtras(page, index:)経由でAnnotationExtras構造体として返されます。Go では同じフィールドがAnnotation構造体（Color、CreationDate、ModificationDate、LinkURI、TextIconName、IsHidden、IsPrintable、IsReadOnly、IsMarkedDeleted）に直接組み込まれています。内部的には両方ともpdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_* C ABIアクセサファミリを呼び出しています。

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

AnnotationExtrasのフィールド（Swift）

フィールド	型	説明
`color`	`UInt32`	パックされたアノテーションカラー
`creationDate`	`Int64`	作成タイムスタンプ
`modificationDate`	`Int64`	更新タイムスタンプ
`hidden`	`Bool`	hiddenフラグ
`markedDeleted`	`Bool`	marked-deletedフラグ
`printable`	`Bool`	printフラグ
`readOnly`	`Bool`	read-onlyフラグ
`uri`	`String`	リンクアノテーションのURI（なければ空文字）
`iconName`	`String`	テキストアノテーションのアイコン名（なければ空文字）
`quadPoints`	`[QuadPoint]`	ハイライト・マークアップの四角形（各4コーナー）

バインディングの対応状況。 annotation_extrasは Swift（doc.annotationExtras(page, index:)）で専用のAnnotationExtras構造体として、またpdf_oxide_annotation_get_* C ABI アクセサファミリ経由で公開されています。同じインデックス別アクセサファミリは C++（doc.annotation_get_*）、R（pdf_annotation_get_*）、Julia（annotation_*）、Zig（AnnotationList.getColor/isHidden/...）、Elixir（PdfOxide.annotation_color/...）でもラップされています。Go、Dart（doc.pageAnnotationDetails(page)）、Objective-C（POXAnnotationにインライン）では、同じ属性が各アノテーションにインラインで展開されます。WASMターゲットではコンパイル対象外です。

応用サンプル

ブックマークから目次を構築する

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

すべてのコメント（Textアノテーション）を抽出する

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

すべてのハイパーリンクを抽出する

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

よくある質問

get_annotationsとannotation_extrasの違いは何ですか？ get_annotationsはコアのアノテーションビュー（サブタイプ、内容、矩形、作成者、日付、カラー、フラグ）を返します。annotation_extrasはさらに拡張属性（パックされたカラー、タイムスタンプ、4つの表示フラグ、リンクURI、テキストアノテーションのアイコン名、ハイライトのクアッドポイント）を追加します。Goではこれらが1つのAnnotationにマージされており、Swiftでは別のAnnotationExtras構造体になっています。

annotations_to_jsonが生成するJSONスキーマはどのようなものですか？ GoのAnnotation構造体に対応するJSON配列です。フィールド：type、subtype、content、x、y、width、height、author、borderWidth、color、creationDate、modificationDate、linkURI、textIconName、isHidden、isPrintable、isReadOnly、isMarkedDeleted。

リンクURIやアイコン名が空になることがあるのはなぜですか？ これらのフィールドは特定のサブタイプにのみ適用されます。uriはLinkアノテーション、iconNameはText（付箋）アノテーション専用です。それ以外のサブタイプでは空文字列が返されます。

アノテーション抽出は高速ですか？ はい。PDF Oxideの抽出コアは、ベンチマークコーパスで平均0.8 ms / p99 9 msのパフォーマンスを発揮し、100%のパスレートを達成しています。

アノテーション抽出

クイックサンプル

APIリファレンス

get_annotations(page_index) -> Vec<Annotation>

アノテーションのフィールド

AnnotationSubtypeのバリアント

get_outline() -> Option<Vec<OutlineItem>>

OutlineItemのフィールド

Destinationのバリアント

PdfPageのアノテーションAPI（DOM）

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

AnnotationWrapperのメソッド

annotations_to_json — ページのアノテーションをシリアライズ

annotation_extras — 拡張アノテーション属性

AnnotationExtrasのフィールド（Swift）

応用サンプル

ブックマークから目次を構築する

すべてのコメント（Textアノテーション）を抽出する

すべてのハイパーリンクを抽出する

よくある質問

関連ページ

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — ページのアノテーションをシリアライズ

`annotation_extras` — 拡張アノテーション属性