What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

주석 추출

PDF Oxide는 PDF 사양(ISO 32000-1:2008, 12.5절)에 정의된 모든 주석 유형에 대한 접근을 제공합니다. 텍스트 메모, 하이퍼링크, 하이라이트, 스탬프, 잉크 주석 등이 포함됩니다. 탐색 구조 구축에 사용할 수 있는 문서 개요(책갈피)에도 접근할 수 있습니다.

원시 주석 데이터를 얻으려면 PdfDocument의 get_annotations()를 사용하거나, 읽기와 쓰기를 모두 지원하는 통합 AnnotationWrapper 인터페이스를 제공하는 PdfPage DOM API를 사용하세요.

바인딩 지원 현황. 주석 추출은 Python(doc.get_annotations(page)), Rust(doc.get_annotations(page)), WASM(doc.getAnnotations(page)), Go(doc.Annotations(page))에서 사용할 수 있습니다. C# 공개 API는 아직 GetAnnotations 래퍼를 노출하지 않습니다. 네이티브 FFI 메서드(PdfDocumentGetPageAnnotations)는 존재하지만 래핑되지 않았습니다. C#에서 주석을 추출하려면 Rust CLI(pdf-oxide annotations doc.pdf)를 사용하거나, 공개 래퍼가 제공될 때까지 P/Invoke로 직접 PdfPageGetAnnotationsCount / pdf_get_annotations_by_type를 호출하세요.

빠른 예제

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

API 참조

`get_annotations(page_index) -> Vec<Annotation>`

특정 페이지에서 원시 주석을 추출합니다. 해당 페이지에 존재하는 모든 주석 유형을 반환합니다.

매개변수	타입	설명
`page_index`	`usize`	0부터 시작하는 페이지 인덱스

반환값: Annotation 객체의 벡터.

주석 필드

필드	타입	설명
`annotation_type`	`String`	항상 `"Annot"`
`subtype`	`Option<String>`	원시 서브타입 문자열 (예: `"Text"`, `"Highlight"`)
`subtype_enum`	`AnnotationSubtype`	파싱된 서브타입 열거형
`contents`	`Option<String>`	주석의 텍스트 내용
`rect`	`Option<[f64; 4]>`	경계 사각형 [x1, y1, x2, y2]
`author`	`Option<String>`	작성자/생성자 (`/T` 항목)
`creation_date`	`Option<String>`	생성 날짜
`modification_date`	`Option<String>`	마지막 수정 날짜
`subject`	`Option<String>`	주석 주제
`destination`	`Option<LinkDestination>`	링크 목적지 (Link 주석용)
`action`	`Option<LinkAction>`	링크 동작 (Link 주석용)
`color`	`Option<Vec<f64>>`	주석 색상 구성 요소
`flags`	`Option<AnnotationFlags>`	주석 플래그 (invisible, hidden, print 등)

AnnotationSubtype 열거값

열거값	설명
`Text`	스티커 메모 주석
`Link`	하이퍼링크 주석
`FreeText`	텍스트 상자 주석
`Line`	직선 주석
`Square`	직사각형 주석
`Circle`	타원 주석
`Polygon`	다각형 주석
`PolyLine`	꺾은선 주석
`Highlight`	텍스트 하이라이트 마크업
`Underline`	텍스트 밑줄 마크업
`Squiggly`	물결 밑줄 마크업
`StrikeOut`	취소선 마크업
`Stamp`	도장 주석
`Ink`	자유 그리기 주석
`Popup`	다른 주석과 연결된 팝업 메모
`FileAttachment`	첨부 파일 주석
`Sound`	소리 주석
`Movie`	동영상 주석
`Screen`	화면 주석
`Widget`	양식 필드 위젯
`PrinterMark`	프린터 마크 주석
`TrapNet`	트랩 네트워크 주석
`Watermark`	워터마크 주석
`ThreeDimensional`	3D 주석
`Redact`	편집(삭제) 주석
`Caret`	캐럿 주석(삽입 지점)
`RichMedia`	리치 미디어 주석
`Unknown`	인식할 수 없는 주석 유형

`get_outline() -> Option<Vec<OutlineItem>>`

문서 개요(책갈피)가 있을 경우 가져옵니다. 문서 탐색에 사용할 수 있는 개요 항목의 계층 트리를 반환합니다.

반환값:

Some(Vec<OutlineItem>) – 책갈피를 찾아 파싱한 경우
None – 문서에 책갈피가 없는 경우

OutlineItem 필드

필드	타입	설명
`title`	`String`	책갈피 제목 텍스트
`dest`	`Option<Destination>`	탐색 목적지
`children`	`Vec<OutlineItem>`	중첩된 하위 책갈피

Destination 열거값

열거값	설명
`PageIndex(usize)`	직접 페이지 참조 (0부터 시작하는 인덱스)
`Named(String)`	명명된 목적지 식별자

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

PdfPage 주석 API (DOM)

DocumentEditor의 PdfPage 객체는 기존 주석 읽기와 새 주석 추가를 모두 지원하는 고수준 AnnotationWrapper 인터페이스를 제공합니다.

`page.annotations() -> &[AnnotationWrapper]`

페이지의 모든 주석을 래핑된 객체로 가져옵니다.

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

특정 유형의 주석을 찾습니다.

`page.add_annotation(annotation)`

페이지에 새 주석을 추가합니다.

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

인덱스로 주석을 제거합니다.

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

경계 상자가 지정된 영역과 교차하는 주석을 찾습니다.

AnnotationWrapper 메서드

메서드	반환값	설명
`id()`	`AnnotationId`	세션 고유 ID
`subtype()`	`AnnotationSubtype`	주석 유형
`rect()`	`Rect`	경계 사각형
`contents()`	`Option<&str>`	텍스트 내용
`color()`	`Option<(f32, f32, f32)>`	RGB 색상 (0.0–1.0)
`is_modified()`	`bool`	주석이 변경되었는지 여부

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — 페이지 주석 직렬화

annotations_to_json은 단일 FFI 호출로 전체 주석 목록을 JSON 배열로 직렬화합니다. Go 바인딩은 내부적으로 이를 사용하여 []Annotation으로 변환하며, Swift는 annotationsToJson으로 직접 노출합니다. C ABI 시그니처:

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

반환된 UTF-8 문자열은 호출자가 소유합니다(free_string으로 해제). 스키마는 Go의 Annotation 구조체와 일치합니다. 필드: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

바인딩 지원 현황. annotations_to_json은 Swift(doc.annotationsToJson(page)), C++(doc.annotations_to_json(page)), Dart(doc.annotationsToJson(page)), R(pdf_annotations_to_json(doc, page)), Julia(annotations_to_json(doc, page)), Zig(doc.annotationList(page).toJson(...)), Objective-C([doc annotationsJson:page error:]), Elixir(PdfOxide.annotations_to_json(doc, page)), C ABI(pdf_oxide_annotations_to_json)에서 직접 노출됩니다. Go 바인딩은 doc.Annotations(page)를 타입이 지정된 구조체로 디코딩하기 위해 내부적으로 호출합니다. WASM 타겟에서는 컴파일에서 제외됩니다.

`annotation_extras` — 확장 주석 속성

annotation_extras는 핵심 Annotation 뷰에 포함되지 않은 단일 주석의 확장 속성을 읽습니다. 색상, 생성/수정 타임스탬프, 네 가지 가시성 플래그(hidden, marked-deleted, printable, read-only), Link 주석의 URI, Text 주석의 아이콘 이름, 하이라이트/마크업 주석의 쿼드 포인트가 포함됩니다.

Swift에서는 annotationExtras(page, index:)를 통해 AnnotationExtras 구조체로 반환됩니다. Go에서는 동일한 필드가 Annotation 구조체에 직접 통합됩니다(Color, CreationDate, ModificationDate, LinkURI, TextIconName, IsHidden, IsPrintable, IsReadOnly, IsMarkedDeleted). 내부적으로 두 방식 모두 pdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_* C ABI 접근자 패밀리를 호출합니다.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

AnnotationExtras 필드 (Swift)

필드	타입	설명
`color`	`UInt32`	패킹된 주석 색상
`creationDate`	`Int64`	생성 타임스탬프
`modificationDate`	`Int64`	수정 타임스탬프
`hidden`	`Bool`	hidden 플래그
`markedDeleted`	`Bool`	marked-deleted 플래그
`printable`	`Bool`	print 플래그
`readOnly`	`Bool`	read-only 플래그
`uri`	`String`	링크 주석 URI (없으면 빈 문자열)
`iconName`	`String`	텍스트 주석 아이콘 이름 (없으면 빈 문자열)
`quadPoints`	`[QuadPoint]`	하이라이트/마크업 사각형 (각 4개 모서리)

바인딩 지원 현황. annotation_extras는 Swift(doc.annotationExtras(page, index:))에서 전용 AnnotationExtras 구조체로 노출되며, pdf_oxide_annotation_get_* C ABI 접근자 패밀리를 통해서도 노출됩니다. 동일한 인덱스별 접근자 패밀리는 C++(doc.annotation_get_*), R(pdf_annotation_get_*), Julia(annotation_*), Zig(AnnotationList.getColor/isHidden/...), Elixir(PdfOxide.annotation_color/...)에도 래핑되어 있습니다. Go, Dart(doc.pageAnnotationDetails(page)), Objective-C(POXAnnotation에 인라인)에서는 동일한 속성이 각 주석 객체에 인라인으로 포함됩니다. WASM 타겟에서는 접근자가 컴파일에서 제외됩니다.

고급 예제

책갈피에서 목차 구성하기

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

모든 댓글(Text 주석) 추출하기

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

모든 하이퍼링크 추출하기

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

자주 묻는 질문

get_annotations와 annotation_extras의 차이점은 무엇인가요? get_annotations는 핵심 주석 뷰(서브타입, 내용, 사각형, 작성자, 날짜, 색상, 플래그)를 반환합니다. annotation_extras는 패킹된 색상, 타임스탬프, 네 가지 가시성 플래그, 링크 URI, 텍스트 주석 아이콘 이름, 하이라이트 쿼드 포인트 등 확장 속성을 추가로 제공합니다. Go에서는 이것들이 하나의 Annotation에 병합되어 있고, Swift에서는 별도의 AnnotationExtras 구조체입니다.

annotations_to_json이 생성하는 JSON 스키마는 무엇인가요? Go의 Annotation 구조체와 일치하는 JSON 배열입니다. 필드: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

링크 URI와 아이콘 이름이 빈 경우가 있는 이유는 무엇인가요? 이 필드들은 특정 서브타입에만 적용됩니다. uri는 Link 주석에, iconName은 Text(스티커 메모) 주석에만 해당됩니다. 다른 서브타입에서는 빈 문자열로 반환됩니다.

주석 추출 속도는 빠른가요? 네, 매우 빠릅니다. PDF Oxide의 추출 코어는 벤치마크 코퍼스에서 평균 약 0.8 ms, p99 9 ms의 성능을 보이며 100% 통과율을 달성합니다.

주석 추출

빠른 예제

API 참조

get_annotations(page_index) -> Vec<Annotation>

주석 필드

AnnotationSubtype 열거값

get_outline() -> Option<Vec<OutlineItem>>

OutlineItem 필드

Destination 열거값

PdfPage 주석 API (DOM)

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

AnnotationWrapper 메서드

annotations_to_json — 페이지 주석 직렬화

annotation_extras — 확장 주석 속성

AnnotationExtras 필드 (Swift)

고급 예제

책갈피에서 목차 구성하기

모든 댓글(Text 주석) 추출하기

모든 하이퍼링크 추출하기

자주 묻는 질문

관련 페이지

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — 페이지 주석 직렬화

`annotation_extras` — 확장 주석 속성