What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Видобування анотацій

PDF Oxide забезпечує доступ до всіх типів анотацій, визначених у специфікації PDF (ISO 32000-1:2008, розділ 12.5), включаючи текстові нотатки, гіперпосилання, виділення, штампи, рукописні анотації тощо. Зміст документа (закладки) також доступний для побудови навігаційних структур.

Для отримання «сирих» даних анотацій використовуйте get_annotations() на PdfDocument, або DOM API PdfPage, що надає уніфікований інтерфейс AnnotationWrapper з підтримкою читання та запису.

Підтримка в байндингах. Видобування анотацій доступне у Python (doc.get_annotations(page)), Rust (doc.get_annotations(page)), WASM (doc.getAnnotations(page)) та Go (doc.Annotations(page)). Публічний API C# поки не надає обгортку GetAnnotations — нативний FFI-метод (PdfDocumentGetPageAnnotations) існує, але не обгорнутий. Для видобування анотацій із C# використовуйте Rust CLI (pdf-oxide annotations doc.pdf) або викликайте PdfPageGetAnnotationsCount / pdf_get_annotations_by_type напряму через P/Invoke до появи публічної обгортки.

Швидкий приклад

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("annotated.pdf")
page = doc.page(0)
for annot in page.annotations():
    print(f"{annot.subtype}: {annot.contents}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);
for (const annot of annotations) {
  console.log(`${annot.subtype}: ${annot.contents}`);
}
doc.close();

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)
for _, annot := range annotations {
    fmt.Printf("%s: %s\n", annot.Subtype, annot.Content)
}

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);
for (const annot of annotations) {
    console.log(`${annot.subtype}: ${annot.contents}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("annotated.pdf")?;
let annotations = doc.get_annotations(0)?;
for annot in &annotations {
    println!("{:?}: {:?}", annot.subtype_enum, annot.contents);
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    for (Annotation annot : doc.page(0).annotations()) {
        System.out.println(annot.type() + ": " + annot.contents().orElse(""));
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
for (const auto& annot : doc.page_annotations(0)) {
    std::cout << annot.subtype << ": " << annot.content << "\n";
}

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
for annot in try doc.pageAnnotations(0) {
    print("\(annot.subtype): \(annot.content)")
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    for (annot in doc.page(0).annotations()) {
        println("${annot.type()}: ${annot.contents().orElse("")}")
    }
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
for (final annot in doc.pageAnnotations(0)) {
  print('${annot.subtype}: ${annot.content}');
}
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
for (annot in pdf_page_annotations(doc, 0)) {
  cat(sprintf("%s: %s\n", annot$subtype, annot$content))
}

Julia

using PdfOxide

doc = open_document("annotated.pdf")
for annot in page_annotations(doc, 0)
    println("$(annot.subtype): $(annot.content)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);
for (annotations) |annot| {
    std.debug.print("{s}: {s}\n", .{ annot.subtype, annot.content });
}

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  for (annot <- doc.page(0).annotationsSeq) {
    println(s"${annot.`type`()}: ${annot.contentsOption.getOrElse("")}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [d (pdf/open "annotated.pdf")]
  (doseq [annot (pdf/annotations (pdf/page d 0))]
    (println (str (.type annot) ": " (.orElse (.contents annot) "")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
for (POXAnnotation *annot in [doc pageAnnotations:0 error:&err]) {
    NSLog(@"%@: %@", annot.subtype, annot.content);
}

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annots} = PdfOxide.page_annotations(doc, 0)
Enum.each(annots, fn a -> IO.puts("#{a.subtype}: #{a.content}") end)

Довідник API

`get_annotations(page_index) -> Vec<Annotation>`

Видобуває «сирі» анотації з певної сторінки. Повертає всі типи анотацій, наявних на сторінці.

Параметр	Тип	Опис
`page_index`	`usize`	Індекс сторінки від нуля

Повертає: вектор об’єктів Annotation.

Поля анотації

Поле	Тип	Опис
`annotation_type`	`String`	Завжди `"Annot"`
`subtype`	`Option<String>`	Рядок підтипу в «сирому» вигляді (наприклад, `"Text"`, `"Highlight"`)
`subtype_enum`	`AnnotationSubtype`	Розібране перечислення підтипу
`contents`	`Option<String>`	Текстовий вміст анотації
`rect`	`Option<[f64; 4]>`	Обмежувальний прямокутник [x1, y1, x2, y2]
`author`	`Option<String>`	Автор/творець (запис `/T`)
`creation_date`	`Option<String>`	Дата створення
`modification_date`	`Option<String>`	Дата останньої зміни
`subject`	`Option<String>`	Тема анотації
`destination`	`Option<LinkDestination>`	Ціль посилання (для анотацій Link)
`action`	`Option<LinkAction>`	Дія посилання (для анотацій Link)
`color`	`Option<Vec<f64>>`	Компоненти кольору анотації
`flags`	`Option<AnnotationFlags>`	Прапорці анотації (invisible, hidden, print тощо)

Варіанти AnnotationSubtype

Варіант	Опис
`Text`	Анотація-стікер
`Link`	Гіперпосилання
`FreeText`	Текстове поле
`Line`	Лінія
`Square`	Прямокутник
`Circle`	Еліпс
`Polygon`	Багатокутник
`PolyLine`	Ламана лінія
`Highlight`	Виділення тексту
`Underline`	Підкреслення тексту
`Squiggly`	Хвилясте підкреслення
`StrikeOut`	Закреслення
`Stamp`	Штамп
`Ink`	Рукописний малюнок
`Popup`	Спливаюча нотатка, прив’язана до іншої анотації
`FileAttachment`	Прикріплений файл
`Sound`	Звукова анотація
`Movie`	Відеоанотація
`Screen`	Екранна анотація
`Widget`	Віджет поля форми
`PrinterMark`	Мітка принтера
`TrapNet`	Мережа переперекриття
`Watermark`	Водяний знак
`ThreeDimensional`	3D-анотація
`Redact`	Редакція (замазування)
`Caret`	Курсорна анотація (точка вставки)
`RichMedia`	Мультимедійна анотація
`Unknown`	Невідомий тип анотації

`get_outline() -> Option<Vec<OutlineItem>>`

Отримує зміст документа (закладки), якщо він існує. Повертає ієрархічне дерево елементів змісту для навігації документом.

Повертає:

Some(Vec<OutlineItem>) – закладки знайдено й розібрано
None – у документі немає закладок

Поля OutlineItem

Поле	Тип	Опис
`title`	`String`	Текст заголовка закладки
`dest`	`Option<Destination>`	Ціль навігації
`children`	`Vec<OutlineItem>`	Вкладені дочірні закладки

Варіанти Destination

Варіант	Опис
`PageIndex(usize)`	Пряме посилання на сторінку (індекс від нуля)
`Named(String)`	Ідентифікатор іменованої цілі

Rust

let mut doc = PdfDocument::open("book.pdf")?;

if let Some(outline) = doc.get_outline()? {
    for item in &outline {
        println!("  {}", item.title);
        for child in &item.children {
            println!("    {}", child.title);
        }
    }
} else {
    println!("No bookmarks found.");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("book.pdf");
std::string outline = doc.get_outline(); // JSON tree of bookmarks
std::cout << outline << "\n";

Swift

import PdfOxide

let doc = try Document.open("book.pdf")
let outline = try doc.outline() // JSON tree of bookmarks
print(outline)

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('book.pdf');
final outline = doc.getOutline(); // JSON tree of bookmarks
print(outline);
doc.close();

library(pdfoxide)

doc <- pdf_open("book.pdf")
outline <- pdf_get_outline(doc)  # JSON tree of bookmarks
cat(outline, "\n")

Julia

using PdfOxide

doc = open_document("book.pdf")
outline = get_outline(doc) # JSON tree of bookmarks
println(outline)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("book.pdf");
defer doc.deinit();
const outline = try doc.outline(a); // JSON tree of bookmarks; caller owns it
defer a.free(outline);
std.debug.print("{s}\n", .{outline});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"book.pdf" error:&err];
NSString *outline = [doc outlineWithError:&err]; // JSON tree of bookmarks
NSLog(@"%@", outline);

Elixir

{:ok, doc} = PdfOxide.open("book.pdf")
{:ok, outline} = PdfOxide.outline(doc) # JSON tree of bookmarks
IO.puts(outline)

API анотацій PdfPage (DOM)

Об’єкт PdfPage із DocumentEditor надає більш високорівневий інтерфейс AnnotationWrapper, що підтримує як читання наявних анотацій, так і додавання нових.

`page.annotations() -> &[AnnotationWrapper]`

Отримати всі анотації сторінки як обгорнуті об’єкти.

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

Знайти анотації певного типу.

`page.add_annotation(annotation)`

Додати нову анотацію до сторінки.

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

Видалити анотацію за індексом.

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

Знайти анотації, обмежувальні прямокутники яких перетинаються із заданою областю.

Методи AnnotationWrapper

Метод	Повертає	Опис
`id()`	`AnnotationId`	Унікальний ID сесії
`subtype()`	`AnnotationSubtype`	Тип анотації
`rect()`	`Rect`	Обмежувальний прямокутник
`contents()`	`Option<&str>`	Текстовий вміст
`color()`	`Option<(f32, f32, f32)>`	RGB-колір (0.0–1.0)
`is_modified()`	`bool`	Чи була анотація змінена

Python

doc = PdfDocument("annotated.pdf")
page = doc.page(0)

# List all annotations
for annot in page.annotations():
    print(f"[{annot.subtype}] {annot.contents} at {annot.rect}")

# Find highlights
highlights = [a for a in page.annotations() if a.subtype == "Highlight"]
print(f"Found {len(highlights)} highlights")

Node.js

const doc = new PdfDocument("annotated.pdf");
const annotations = doc.getPageAnnotations(0);

// List all annotations
for (const annot of annotations) {
  console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);
doc.close();

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

// List all annotations
for _, annot := range annotations {
    fmt.Printf("[%s] %s\n", annot.Subtype, annot.Content)
}

// Find highlights
highlights := 0
for _, a := range annotations {
    if a.Subtype == "Highlight" {
        highlights++
    }
}
fmt.Printf("Found %d highlights\n", highlights)

WASM

const doc = new WasmPdfDocument(bytes);
const annotations = doc.getAnnotations(0);

// List all annotations
for (const annot of annotations) {
    console.log(`[${annot.subtype}] ${annot.contents}`);
}

// Find highlights
const highlights = annotations.filter(a => a.subtype === "Highlight");
console.log(`Found ${highlights.length} highlights`);

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut editor = DocumentEditor::open("annotated.pdf")?;
let page = editor.get_page(0)?;

// Find all highlight annotations
let highlights = page.find_annotations_by_type(AnnotationSubtype::Highlight);
for h in &highlights {
    println!("Highlight at {:?}: {:?}", h.rect(), h.contents());
}

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.annotation.Annotation;
import fyi.oxide.pdf.annotation.AnnotationType;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("annotated.pdf"))) {
    var annotations = doc.page(0).annotations();

    // List all annotations
    for (Annotation annot : annotations) {
        System.out.println("[" + annot.type() + "] " + annot.contents().orElse(""));
    }

    // Find highlights
    long highlights = annotations.stream()
            .filter(a -> a.type() == AnnotationType.HIGHLIGHT).count();
    System.out.println("Found " + highlights + " highlights");
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
auto annotations = doc.page_annotations(0);

// List all annotations
for (const auto& annot : annotations) {
    std::cout << "[" << annot.subtype << "] " << annot.content << "\n";
}

// Find highlights
int highlights = 0;
for (const auto& a : annotations) {
    if (a.subtype == "Highlight") highlights++;
}
std::cout << "Found " << highlights << " highlights\n";

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let annotations = try doc.pageAnnotations(0)

// List all annotations
for annot in annotations {
    print("[\(annot.subtype)] \(annot.content)")
}

// Find highlights
let highlights = annotations.filter { $0.subtype == "Highlight" }
print("Found \(highlights.count) highlights")

Kotlin

import fyi.oxide.pdf.*
import fyi.oxide.pdf.annotation.AnnotationType

PdfDocument.open(java.nio.file.Path.of("annotated.pdf")).use { doc ->
    val annotations = doc.page(0).annotations()

    // List all annotations
    for (annot in annotations) {
        println("[${annot.type()}] ${annot.contents().orElse("")}")
    }

    // Find highlights
    val highlights = annotations.count { it.type() == AnnotationType.HIGHLIGHT }
    println("Found $highlights highlights")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final annotations = doc.pageAnnotations(0);

// List all annotations
for (final annot in annotations) {
  print('[${annot.subtype}] ${annot.content}');
}

// Find highlights
final highlights = annotations.where((a) => a.subtype == 'Highlight');
print('Found ${highlights.length} highlights');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
annotations <- pdf_page_annotations(doc, 0)

# List all annotations
for (annot in annotations) {
  cat(sprintf("[%s] %s\n", annot$subtype, annot$content))
}

# Find highlights
highlights <- Filter(function(a) a$subtype == "Highlight", annotations)
cat(sprintf("Found %d highlights\n", length(highlights)))

Julia

using PdfOxide

doc = open_document("annotated.pdf")
annotations = page_annotations(doc, 0)

# List all annotations
for annot in annotations
    println("[$(annot.subtype)] $(annot.content)")
end

# Find highlights
highlights = filter(a -> a.subtype == "Highlight", annotations)
println("Found $(length(highlights)) highlights")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
const annotations = try doc.pageAnnotations(a, 0);
defer pdf_oxide.Document.freeAnnotations(a, annotations);

// List all annotations
for (annotations) |annot| {
    std.debug.print("[{s}] {s}\n", .{ annot.subtype, annot.content });
}

// Find highlights
var highlights: usize = 0;
for (annotations) |annot| {
    if (std.mem.eql(u8, annot.subtype, "Highlight")) highlights += 1;
}
std.debug.print("Found {d} highlights\n", .{highlights});

Scala

import fyi.oxide.pdf.{PdfDocument, annotationsSeq, contentsOption}
import fyi.oxide.pdf.annotation.AnnotationType
import scala.util.Using

Using.resource(PdfDocument.open("annotated.pdf")) { doc =>
  val annotations = doc.page(0).annotationsSeq

  // List all annotations
  for (annot <- annotations) {
    println(s"[${annot.`type`()}] ${annot.contentsOption.getOrElse("")}")
  }

  // Find highlights
  val highlights = annotations.count(_.`type`() == AnnotationType.HIGHLIGHT)
  println(s"Found $highlights highlights")
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import 'fyi.oxide.pdf.annotation.AnnotationType)

(with-open [d (pdf/open "annotated.pdf")]
  (let [annotations (pdf/annotations (pdf/page d 0))]
    ;; List all annotations
    (doseq [annot annotations]
      (println (str "[" (.type annot) "] " (.orElse (.contents annot) ""))))
    ;; Find highlights
    (let [highlights (count (filter #(= (.type %) AnnotationType/HIGHLIGHT) annotations))]
      (println (str "Found " highlights " highlights")))))

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSArray<POXAnnotation*> *annotations = [doc pageAnnotations:0 error:&err];

// List all annotations
for (POXAnnotation *annot in annotations) {
    NSLog(@"[%@] %@", annot.subtype, annot.content);
}

// Find highlights
NSPredicate *p = [NSPredicate predicateWithFormat:@"subtype == %@", @"Highlight"];
NSUInteger highlights = [annotations filteredArrayUsingPredicate:p].count;
NSLog(@"Found %lu highlights", (unsigned long)highlights);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, annotations} = PdfOxide.page_annotations(doc, 0)

# List all annotations
Enum.each(annotations, fn a -> IO.puts("[#{a.subtype}] #{a.content}") end)

# Find highlights
highlights = Enum.count(annotations, &(&1.subtype == "Highlight"))
IO.puts("Found #{highlights} highlights")

`annotations_to_json` — серіалізація анотацій сторінки

annotations_to_json серіалізує весь список анотацій у JSON-масив за один FFI-виклик. Байндинг Go використовує це всередині для матеріалізації []Annotation; Swift надає його напряму як annotationsToJson. Підпис C ABI:

char *pdf_oxide_annotations_to_json(const FfiAnnotationList *annotations, int32_t *error_code);

Повернутий рядок UTF-8 належить виклику (звільнити через free_string). Його схема відповідає структурі Annotation у Go — поля: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let json = try doc.annotationsToJson(0) // String of JSON
print(json)

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiAnnotationList *list = pdf_document_get_page_annotations(doc, /*page=*/0, &err);
char *json = pdf_oxide_annotations_to_json(list, &err);
printf("%s\n", json);
free_string(json);
pdf_oxide_annotation_list_free(list);

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");
std::string json = doc.annotations_to_json(0); // JSON string
std::cout << json << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final json = doc.annotationsToJson(0); // JSON string
print(json);
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")
json <- pdf_annotations_to_json(doc, 0)  # JSON string
cat(json, "\n")

Julia

using PdfOxide

doc = open_document("annotated.pdf")
json = annotations_to_json(doc, 0) # JSON string
println(json)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();
const json = try list.toJson(a); // caller owns the slice
defer a.free(json);
std.debug.print("{s}\n", .{json});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
NSString *json = [doc annotationsJson:0 error:&err]; // JSON string
NSLog(@"%@", json);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")
{:ok, json} = PdfOxide.annotations_to_json(doc, 0) # JSON string
IO.puts(json)

Підтримка в байндингах. annotations_to_json доступна безпосередньо у Swift (doc.annotationsToJson(page)), C++ (doc.annotations_to_json(page)), Dart (doc.annotationsToJson(page)), R (pdf_annotations_to_json(doc, page)), Julia (annotations_to_json(doc, page)), Zig (doc.annotationList(page).toJson(...)), Objective-C ([doc annotationsJson:page error:]), Elixir (PdfOxide.annotations_to_json(doc, page)) та C ABI (pdf_oxide_annotations_to_json). Байндинг Go викликає її всередині для декодування doc.Annotations(page) у типізовані структури. Для WASM-таргету функція виключається з компіляції.

`annotation_extras` — розширені атрибути анотації

annotation_extras зчитує розширені атрибути окремої анотації, які не входять до базового представлення Annotation: колір, мітки часу створення/зміни, чотири прапорці видимості (hidden, marked-deleted, printable, read-only), URI посилання для анотацій Link, назву іконки для текстових анотацій та координати чотирикутників виділення/розмітки.

У Swift вони повертаються разом як структура AnnotationExtras через annotationExtras(page, index:). У Go ті самі поля вбудовані безпосередньо в структуру Annotation (Color, CreationDate, ModificationDate, LinkURI, TextIconName, IsHidden, IsPrintable, IsReadOnly, IsMarkedDeleted). Під капотом обидва варіанти використовують сімейство C ABI-аксесорів pdf_oxide_annotation_get_* / pdf_oxide_*_annotation_get_*.

Swift

import PdfOxide

let doc = try Document.open("annotated.pdf")
let extras = try doc.annotationExtras(0, index: 0) // AnnotationExtras

print("color=\(extras.color) created=\(extras.creationDate)")
print("hidden=\(extras.hidden) printable=\(extras.printable) readOnly=\(extras.readOnly)")
if !extras.uri.isEmpty { print("link URI: \(extras.uri)") }
if !extras.iconName.isEmpty { print("icon: \(extras.iconName)") }
for q in extras.quadPoints {
    print("quad: (\(q.x1),\(q.y1)) (\(q.x2),\(q.y2)) (\(q.x3),\(q.y3)) (\(q.x4),\(q.y4))")
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("annotated.pdf")
defer doc.Close()
annotations, _ := doc.Annotations(0)

a := annotations[0]
fmt.Printf("color=%d created=%d modified=%d\n", a.Color, a.CreationDate, a.ModificationDate)
fmt.Printf("hidden=%v printable=%v readOnly=%v deleted=%v\n",
    a.IsHidden, a.IsPrintable, a.IsReadOnly, a.IsMarkedDeleted)
if a.LinkURI != "" {
    fmt.Printf("link URI: %s\n", a.LinkURI)
}
if a.TextIconName != "" {
    fmt.Printf("icon: %s\n", a.TextIconName)
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("annotated.pdf");

std::cout << "color=" << doc.annotation_get_color(0, 0)
          << " created=" << doc.annotation_get_creation_date(0, 0) << "\n";
std::cout << "hidden=" << doc.annotation_is_hidden(0, 0)
          << " printable=" << doc.annotation_is_printable(0, 0)
          << " readOnly=" << doc.annotation_is_read_only(0, 0) << "\n";
auto uri = doc.link_annotation_get_uri(0, 0);
if (!uri.empty()) std::cout << "link URI: " << uri << "\n";
auto icon = doc.text_annotation_get_icon_name(0, 0);
if (!icon.empty()) std::cout << "icon: " << icon << "\n";

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('annotated.pdf');
final a = doc.pageAnnotationDetails(0)[0]; // AnnotationDetails

print('color=${a.color} created=${a.creationDate} modified=${a.modificationDate}');
print('hidden=${a.hidden} printable=${a.printable} readOnly=${a.readOnly}');
if (a.linkUri.isNotEmpty) print('link URI: ${a.linkUri}');
if (a.iconName.isNotEmpty) print('icon: ${a.iconName}');
doc.close();

library(pdfoxide)

doc <- pdf_open("annotated.pdf")

cat(sprintf("color=%d created=%d\n",
    pdf_annotation_get_color(doc, 0, 0),
    pdf_annotation_get_creation_date(doc, 0, 0)))
cat(sprintf("hidden=%s printable=%s readOnly=%s\n",
    pdf_annotation_is_hidden(doc, 0, 0),
    pdf_annotation_is_printable(doc, 0, 0),
    pdf_annotation_is_read_only(doc, 0, 0)))
uri <- pdf_link_annotation_get_uri(doc, 0, 0)
if (nzchar(uri)) cat(sprintf("link URI: %s\n", uri))
icon <- pdf_text_annotation_get_icon_name(doc, 0, 0)
if (nzchar(icon)) cat(sprintf("icon: %s\n", icon))

Julia

using PdfOxide

doc = open_document("annotated.pdf")

println("color=$(annotation_get_color(doc, 0, 0)) created=$(annotation_creation_date(doc, 0, 0))")
println("hidden=$(annotation_is_hidden(doc, 0, 0)) printable=$(annotation_is_printable(doc, 0, 0)) readOnly=$(annotation_is_read_only(doc, 0, 0))")
uri = link_annotation_uri(doc, 0, 0)
isempty(uri) || println("link URI: $uri")
icon = text_annotation_icon_name(doc, 0, 0)
isempty(icon) || println("icon: $icon")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("annotated.pdf");
defer doc.deinit();
var list = try doc.annotationList(0);
defer list.deinit();

std.debug.print("color={d} created={d}\n", .{ try list.getColor(0), try list.getCreationDate(0) });
std.debug.print("hidden={} printable={} readOnly={}\n", .{ try list.isHidden(0), try list.isPrintable(0), try list.isReadOnly(0) });
const uri = try list.linkUri(a, 0);
defer a.free(uri);
if (uri.len != 0) std.debug.print("link URI: {s}\n", .{uri});
const icon = try list.textIconName(a, 0);
defer a.free(icon);
if (icon.len != 0) std.debug.print("icon: {s}\n", .{icon});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"annotated.pdf" error:&err];
POXAnnotation *a = [doc pageAnnotations:0 error:&err].firstObject;

NSLog(@"color=%u created=%lld modified=%lld", a.color, a.creationDate, a.modificationDate);
NSLog(@"hidden=%d printable=%d readOnly=%d", a.hidden, a.printable, a.readOnly);
if (a.linkUri.length) NSLog(@"link URI: %@", a.linkUri);
if (a.iconName.length) NSLog(@"icon: %@", a.iconName);

Elixir

{:ok, doc} = PdfOxide.open("annotated.pdf")

{:ok, color} = PdfOxide.annotation_color(doc, 0, 0)
{:ok, created} = PdfOxide.annotation_creation_date(doc, 0, 0)
IO.puts("color=#{color} created=#{created}")

{:ok, hidden} = PdfOxide.annotation_hidden?(doc, 0, 0)
{:ok, printable} = PdfOxide.annotation_printable?(doc, 0, 0)
{:ok, read_only} = PdfOxide.annotation_read_only?(doc, 0, 0)
IO.puts("hidden=#{hidden} printable=#{printable} readOnly=#{read_only}")

{:ok, uri} = PdfOxide.link_annotation_uri(doc, 0, 0)
if uri != "", do: IO.puts("link URI: #{uri}")
{:ok, icon} = PdfOxide.text_annotation_icon_name(doc, 0, 0)
if icon != "", do: IO.puts("icon: #{icon}")

Поля AnnotationExtras (Swift)

Поле	Тип	Опис
`color`	`UInt32`	Запакований колір анотації
`creationDate`	`Int64`	Мітка часу створення
`modificationDate`	`Int64`	Мітка часу зміни
`hidden`	`Bool`	Прапорець hidden
`markedDeleted`	`Bool`	Прапорець marked-deleted
`printable`	`Bool`	Прапорець print
`readOnly`	`Bool`	Прапорець read-only
`uri`	`String`	URI посилання (порожній рядок, якщо відсутній)
`iconName`	`String`	Назва іконки текстової анотації (порожній рядок, якщо відсутня)
`quadPoints`	`[QuadPoint]`	Чотирикутники виділення/розмітки (по 4 кути кожен)

Підтримка в байндингах. annotation_extras надається як спеціальна структура AnnotationExtras у Swift (doc.annotationExtras(page, index:)) та через сімейство C ABI-аксесорів pdf_oxide_annotation_get_*. Те саме сімейство за індексом обгорнуте у C++ (doc.annotation_get_*), R (pdf_annotation_get_*), Julia (annotation_*), Zig (AnnotationList.getColor/isHidden/...) та Elixir (PdfOxide.annotation_color/...). У Go, Dart (doc.pageAnnotationDetails(page)) та Objective-C (вбудовано в POXAnnotation) ті самі атрибути розміщені безпосередньо в кожній анотації. Для WASM-таргету аксесори виключаються з компіляції.

Розширені приклади

Побудова змісту із закладок

use pdf_oxide::PdfDocument;
use pdf_oxide::outline::Destination;

let mut doc = PdfDocument::open("book.pdf")?;

fn print_toc(items: &[pdf_oxide::outline::OutlineItem], depth: usize) {
    for item in items {
        let indent = "  ".repeat(depth);
        let page = match &item.dest {
            Some(Destination::PageIndex(p)) => format!("page {}", p + 1),
            Some(Destination::Named(n)) => format!("dest '{}'", n),
            None => "no dest".to_string(),
        };
        println!("{}{} ({})", indent, item.title, page);
        print_toc(&item.children, depth + 1);
    }
}

if let Some(outline) = doc.get_outline()? {
    println!("Table of Contents:");
    print_toc(&outline, 0);
}

Видобування всіх коментарів (анотації типу Text)

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("reviewed.pdf")?;
let page_count = doc.page_count()?;

for page_idx in 0..page_count {
    let annotations = doc.get_annotations(page_idx)?;
    let comments: Vec<_> = annotations.iter()
        .filter(|a| a.subtype_enum == AnnotationSubtype::Text)
        .collect();

    if !comments.is_empty() {
        println!("Page {}:", page_idx + 1);
        for c in &comments {
            let author = c.author.as_deref().unwrap_or("Unknown");
            let text = c.contents.as_deref().unwrap_or("");
            println!("  [{}] {}", author, text);
        }
    }
}

Видобування всіх гіперпосилань

use pdf_oxide::PdfDocument;
use pdf_oxide::annotation_types::AnnotationSubtype;

let mut doc = PdfDocument::open("report.pdf")?;
let annotations = doc.get_annotations(0)?;

let links: Vec<_> = annotations.iter()
    .filter(|a| a.subtype_enum == AnnotationSubtype::Link)
    .collect();

for link in &links {
    if let Some(ref action) = link.action {
        println!("Link: {:?}", action);
    }
    if let Some(ref dest) = link.destination {
        println!("Internal link: {:?}", dest);
    }
}

Часті запитання

У чому різниця між get_annotations та annotation_extras? get_annotations повертає базове представлення анотації (підтип, вміст, прямокутник, автор, дати, колір, прапорці). annotation_extras додає розширені атрибути — запакований колір, мітки часу, чотири прапорці видимості, URI посилання, назву іконки текстової анотації та координати чотирикутників виділення. У Go вони об’єднані в одну Annotation; у Swift це окрема структура AnnotationExtras.

Яка JSON-схема генерується annotations_to_json? JSON-масив, що відповідає структурі Annotation у Go. Поля: type, subtype, content, x, y, width, height, author, borderWidth, color, creationDate, modificationDate, linkURI, textIconName, isHidden, isPrintable, isReadOnly, isMarkedDeleted.

Чому URI посилань та назви іконок іноді порожні? Ці поля застосовні лише до певних підтипів: uri — для анотацій Link, iconName — для анотацій Text (стікерів). Для інших підтипів вони повертаються як порожні рядки.

Чи швидко працює видобування анотацій? Так. Ядро видобування PDF Oxide показує на тестовому корпусі середній час близько 0,8 мс і p99 у 9 мс при 100% успішності.

Пов’язані сторінки

Видобування даних форм – видобування полів форм (анотації Widget)
Видобування тексту – видобування текстового вмісту зі сторінок
Метадані та XMP – читання властивостей документа та закладок
Видобування зображень – вбудовані зображення та аксесор елементів сторінки

Видобування анотацій

Швидкий приклад

Довідник API

get_annotations(page_index) -> Vec<Annotation>

Поля анотації

Варіанти AnnotationSubtype

get_outline() -> Option<Vec<OutlineItem>>

Поля OutlineItem

Варіанти Destination

API анотацій PdfPage (DOM)

page.annotations() -> &[AnnotationWrapper]

page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>

page.add_annotation(annotation)

page.remove_annotation(index) -> Option<AnnotationWrapper>

page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>

Методи AnnotationWrapper

annotations_to_json — серіалізація анотацій сторінки

annotation_extras — розширені атрибути анотації

Поля AnnotationExtras (Swift)

Розширені приклади

Побудова змісту із закладок

Видобування всіх коментарів (анотації типу Text)

Видобування всіх гіперпосилань

Часті запитання

Пов’язані сторінки

`get_annotations(page_index) -> Vec<Annotation>`

`get_outline() -> Option<Vec<OutlineItem>>`

`page.annotations() -> &[AnnotationWrapper]`

`page.find_annotations_by_type(subtype) -> Vec<&AnnotationWrapper>`

`page.add_annotation(annotation)`

`page.remove_annotation(index) -> Option<AnnotationWrapper>`

`page.find_annotations_in_region(rect) -> Vec<&AnnotationWrapper>`

`annotations_to_json` — серіалізація анотацій сторінки

`annotation_extras` — розширені атрибути анотації