What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Kotlin API Reference

PDF Oxide ships idiomatic Kotlin/JVM bindings (Android-ready) as a thin facade over the mature fyi.oxide:pdf-oxide Java binding, which owns the single JNI native bridge (the pdf_oxide_jni crate). The Kotlin module adds zero native code: it re-exports the Java types (PdfDocument, Pdf, PdfPage, DocumentEditor, PdfSigner, PdfValidator, AutoExtractor, and the geometry / text / table / search value types) and layers Kotlin sugar — Optional<T> to T? extensions and use { } on the AutoCloseable handles.

// build.gradle.kts
dependencies {
    implementation("fyi.oxide:pdf-oxide-kotlin:0.3.69")
}

The JNI native library (libpdf_oxide_jni) is not bundled — load it via System.loadLibrary("pdf_oxide_jni") (ship the .so/.dylib on your java.library.path, or in jniLibs/<abi>/ on Android), or point the Java NativeLoader at it with -Dfyi.oxide.pdf.lib.path=<path>.

For the Java API, see the Java API Reference. For the Rust API, see the Rust API Reference. For type details, see Types & Enums.

import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull

Pdf.fromMarkdown("# Hello\n\nbody\n").use { pdf ->
    PdfDocument.open(pdf.save()).use { doc ->
        println(doc.pageCount())
        println(doc.extractText(0))
        println(doc.toMarkdown())
        println(doc.page(0).words().map { it.text() })
        println(doc.producerOrNull() ?: "(no producer)")   // Optional -> nullable
    }
}

All handles (PdfDocument, Pdf, DocumentEditor) implement AutoCloseable, so the Kotlin use { } block closes native memory deterministically. Errors raise PdfException (and its subclasses); see Exceptions.

PdfDocument

The primary read-only entry point to a PDF — open, extract, convert, render, search, and inspect form fields. Instances own native memory and must be closed; use use { }.

import fyi.oxide.pdf.PdfDocument

Factory Methods

PdfDocument.open(path: Path): PdfDocument

Open a PDF from a filesystem path.

PdfDocument.open(path: String): PdfDocument

Open a PDF from a path string.

PdfDocument.open(bytes: ByteArray): PdfDocument

Open a PDF from in-memory bytes (e.g. downloaded from S3 or received over HTTP).

PdfDocument.open(path: Path, password: String): PdfDocument

Open an encrypted PDF from a path with the user or owner password.

PdfDocument.open(path: String, password: String): PdfDocument

Open an encrypted PDF from a path string with a password.

PdfDocument.open(bytes: ByteArray, password: String): PdfDocument

Open an encrypted PDF from bytes with a password.

PdfDocument.open(stream: InputStream): PdfDocument

Open a PDF by reading all bytes from an InputStream.

Static One-Shots

PdfDocument.extractText(path: String): String
PdfDocument.extractText(path: Path): String

Open, extract all text, and close in a single call — for the simple case where you do not need a live handle.

Authentication

doc.authenticate(password: String): Boolean
doc.authenticate(password: ByteArray): Boolean

Authenticate an encrypted document after opening. Returns true if the password matched.

Document Info

doc.pageCount(): Int

Number of pages in the document.

doc.producer(): Optional<String>
doc.creator(): Optional<String>

Document /Producer and /Creator metadata. Use the Kotlin producerOrNull() / creatorOrNull() extensions for null-based access.

val doc.isOpen: Boolean

Whether the native handle is still open (Kotlin property over the Java isOpen() getter).

Text Extraction

doc.extractText(pageIndex: Int): String

Extract plain text from a single zero-indexed page.

doc.extractTextAuto(pageIndex: Int): String

Extract text with automatic strategy selection (falls back to OCR for scanned pages when the OCR feature is available).

doc.extractStructured(page: Int): String

Extract a structured (JSON) representation of the page’s text and layout.

Conversion

doc.toMarkdown(): String
doc.toMarkdown(pageIndex: Int): String

Convert the whole document or a single page to Markdown.

doc.toHtml(): String
doc.toHtml(pageIndex: Int): String

Convert the whole document or a single page to HTML.

Search

doc.search(query: String): List<SearchMatch>

Search the document for a literal string. Returns per-page matches with bounding boxes.

doc.search(query: String, caseInsensitive: Boolean, regex: Boolean, maxResults: Int): List<SearchMatch>

Search with case sensitivity, regex, and a result cap (maxResults = 0 means no cap).

Forms

doc.formFields(): List<FormField>

Get all AcroForm fields with their type, value, widget bounds, and page index. See FormField.

Rendering

doc.render(pageIndex: Int): ByteArray
doc.render(pageIndex: Int, dpi: Int): ByteArray

Render a page to PNG image bytes at the default DPI or a specified DPI.

Page Access

doc.page(index: Int): PdfPage

Get a lazy PdfPage handle for the given zero-based index.

doc.pages(): List<PdfPage>

Get all pages as a list.

doc.pagesStream(): Stream<PdfPage>

Get all pages as a Java Stream for fluent processing.

Lifecycle

doc.close()

Free native memory. Idempotent — a second call is a no-op. Prefer use { }.

PdfPage

A lazy page handle returned by PdfDocument.page(), pages(), or pagesStream(). All accessors dispatch to the parent document on access.

PdfDocument.open(bytes).use { doc ->
    val page = doc.page(0)
    val words = page.words()
    val tables = page.tables()
}

Geometry

page.parent(): PdfDocument
page.index(): Int
page.mediaBox(): BBox
page.cropBox(): BBox
page.width(): Double
page.height(): Double
page.rotation(): Int

Parent document, zero-based index, MediaBox / CropBox rectangles, dimensions in PDF points, and page rotation in degrees.

Content Extraction

page.text(): String

Extract all text on the page.

page.text(region: BBox): String

Extract text within a bounding-box region.

page.words(): List<TextWord>
page.lines(): List<TextLine>
page.chars(): List<TextChar>

Structured text at word, line, and character granularity.

page.images(): List<ExtractedImage>
page.tables(): List<Table>
page.annotations(): List<Annotation>

Extracted images, detected tables, and page annotations.

Pdf

Create PDFs from source formats, split by bookmarks, and serialize. Implements AutoCloseable.

import fyi.oxide.pdf.Pdf

Factory Methods

Pdf.fromMarkdown(markdown: String): Pdf

Create a PDF from Markdown content.

Pdf.fromHtml(html: String): Pdf

Create a PDF from HTML content.

Pdf.fromImages(images: List<ByteArray>): Pdf

Create a multi-page PDF from a list of image byte arrays, one page per image.

Splitting

pdf.planSplitByBookmarks(opts: SplitByBookmarksOptions): List<BookmarkSegment>

Plan a split by outline bookmarks without producing output — returns the segments (title, page range, filename) that would be created.

pdf.splitByBookmarks(opts: SplitByBookmarksOptions): List<ByteArray>

Split into multiple PDFs by bookmark level. Returns one byte array per segment.

Pdf.planSplitByBookmarksCount(sourcePdf: ByteArray, level: Int): Int

Static helper: count how many segments a bookmark split at the given level would produce.

Pdf.splitByBookmarksFromBytes(sourcePdf: ByteArray, level: Int): Array<ByteArray>

Static helper: split source PDF bytes by bookmark level directly.

Saving

pdf.save(): ByteArray

Serialize the PDF to bytes.

pdf.saveTo(out: Path)

Write the PDF to a file.

val pdf.isOpen: Boolean
pdf.close()

Lifecycle (Kotlin isOpen property and close()). Prefer use { }.

DocumentEditor

Mutating editor for redaction, form filling, metadata scrubbing, and incremental saves. Implements AutoCloseable. Setter methods return this for fluent chaining.

import fyi.oxide.pdf.DocumentEditor

Factory Methods

DocumentEditor.open(path: Path): DocumentEditor
DocumentEditor.open(path: String): DocumentEditor
DocumentEditor.open(bytes: ByteArray): DocumentEditor

Open a document for editing from a path or in-memory bytes.

Form Filling

editor.setFormField(name: String, value: String): DocumentEditor

Set a text/choice field value by fully qualified name.

editor.setFormField(name: String, checked: Boolean): DocumentEditor

Set a checkbox/radio field state by name.

Redaction

editor.addRedaction(pageIndex: Int, region: BBox): DocumentEditor

Queue a redaction over a rectangular region on a page.

editor.redactionCount(pageIndex: Int): Int
editor.redactionCount(): Int

Number of queued redactions on a page, or across the whole document.

editor.applyRedactionsDestructive(): RedactResult

Permanently apply all queued redactions, removing underlying content. Returns a RedactResult with the count applied and oracle-verification status.

Metadata

editor.scrubMetadata(): DocumentEditor

Strip document metadata (Info dictionary, XMP) for privacy.

Saving

editor.save(): ByteArray
editor.saveTo(out: Path)

Serialize the edited document with a full rewrite.

editor.saveIncremental(): ByteArray
editor.saveIncrementalTo(out: Path)

Serialize using an incremental update (appends changes, preserving the original bytes).

val editor.isOpen: Boolean
editor.close()

Lifecycle. Prefer use { }.

AutoExtractor

Adaptive extraction pipeline that classifies pages (text-layer vs. scanned), applies OCR where needed, and emits text / Markdown / HTML with confidence scores.

import fyi.oxide.pdf.AutoExtractor

Factory Methods

AutoExtractor.of(doc: PdfDocument): AutoExtractor
AutoExtractor.of(doc: PdfDocument, config: AutoExtractConfig): AutoExtractor

Create an extractor over a document, optionally with a custom AutoExtractConfig.

AutoExtractor.fast(doc: PdfDocument): AutoExtractor
AutoExtractor.balanced(doc: PdfDocument): AutoExtractor
AutoExtractor.highFidelity(doc: PdfDocument): AutoExtractor

Preset configurations trading speed for fidelity.

Extraction

extractor.extractText(): String
extractor.extractTextForPage(pageIndex: Int): String

Plain-text extraction for the whole document or a single page.

extractor.extractDocument(): AutoResult
extractor.extractPage(pageIndex: Int): AutoResult

Full adaptive extraction returning an AutoResult (text, optional Markdown/HTML, reason, confidence, OCR flag, regions).

extractor.extractAutoDocument(): AutoResult
extractor.extractAutoPage(pageIndex: Int): AutoResult

Auto-mode variants of the document- and page-level extraction.

extractor.extractDocumentJson(): String
extractor.extractPageJson(pageIndex: Int): String

Extraction serialized as a JSON string.

Classification

extractor.classifyDocument(): ClassifyResult
extractor.classifyPage(pageIndex: Int): ClassifyResult

Classify the document or a page, returning a ClassifyResult (per-page class plus lists of pages needing OCR, containing charts, or encrypted).

extractor.classifyPageKind(pageIndex: Int): PageClass
extractor.classifyDocumentKinds(): List<PageClass>

Get the PageClass (TEXT_LAYER / SCANNED / MIXED) for a page or all pages.

Accessors

extractor.document(): PdfDocument
extractor.config(): AutoExtractConfig

The wrapped document and the active configuration.

MarkdownConverter

Stateless, thread-safe converter from PdfDocument to Markdown or HTML.

import fyi.oxide.pdf.MarkdownConverter

MarkdownConverter.toMarkdown(doc: PdfDocument): String
MarkdownConverter.toMarkdown(doc: PdfDocument, pageIndex: Int): String
MarkdownConverter.toHtml(doc: PdfDocument): String
MarkdownConverter.toHtml(doc: PdfDocument, pageIndex: Int): String

Convert the whole document or a single page to Markdown / HTML.

PdfSigner

Digitally sign and verify PDFs with PKCS#12 keystores (PAdES B-B / B-T / B-LT levels).

import fyi.oxide.pdf.PdfSigner

PdfSigner.fromPkcs12(keystore: Path, password: String): PdfSigner
PdfSigner.fromPkcs12(keystoreBytes: ByteArray, password: String): PdfSigner

Load a signer from a PKCS#12 keystore on disk or in memory.

signer.sign(pdf: ByteArray, opts: SignOptions): ByteArray

Sign PDF bytes with the given SignOptions (level, reason, location, contact, TSA URL). Returns the signed PDF.

signer.verify(pdf: ByteArray): Boolean

Verify all signatures in a PDF. Returns true if every signature is cryptographically valid.

PdfSigner.classifyLevel(pdf: ByteArray): SignatureLevel

Static helper: detect the PAdES conformance level of an existing signed PDF.

PdfValidator

Stateless, thread-safe validation against PDF/A, PDF/X, and PDF/UA conformance levels.

import fyi.oxide.pdf.PdfValidator

PdfValidator.isPdfA(doc: PdfDocument, level: PdfALevel): Boolean
PdfValidator.isPdfUa(doc: PdfDocument, level: PdfUaLevel): Boolean

Quick boolean conformance checks.

PdfValidator.validatePdfA(doc: PdfDocument, level: PdfALevel): ValidationResult
PdfValidator.validatePdfX(doc: PdfDocument, level: PdfXLevel): ValidationResult
PdfValidator.validatePdfUa(doc: PdfDocument, level: PdfUaLevel): ValidationResult

Full validation returning a ValidationResult with the list of violations.

PdfPolicy

Global security-policy controls governing which cryptographic algorithms are permitted.

import fyi.oxide.pdf.PdfPolicy

PdfPolicy.current(): PolicyMode
PdfPolicy.set(mode: PolicyMode)
PdfPolicy.compat(): PolicyMode
PdfPolicy.strict(): PolicyMode
PdfPolicy.fipsStrict(): PolicyMode

Read or set the active PolicyMode, and obtain the built-in compat / strict / FIPS-strict modes.

Kotlin Extensions

The Kotlin facade’s only added surface: Optional<T> to T? converters and the generic orNull() helper. Import from fyi.oxide.pdf.

fun <T : Any> Optional<T>.orNull(): T?

Generic: empty Optional becomes null.

fun PdfDocument.producerOrNull(): String?
fun PdfDocument.creatorOrNull(): String?

Document /Producer and /Creator, or null if absent.

fun FormField.valueOrNull(): String?
fun FormField.bboxOrNull(): BBox?

Form-field value and widget bounding box, or null.

fun Annotation.contentsOrNull(): String?
fun Annotation.uriOrNull(): String?

Annotation /Contents and link target URI, or null.

fun AutoResult.markdownOrNull(): String?
fun AutoResult.htmlOrNull(): String?

Markdown / HTML rendering of an auto-extraction, or null if not produced.

fun ValidationViolation.pageIndexOrNull(): Int?

Page index a violation applies to, or null for document-level rules.

Geometry Types

BBox

Axis-aligned bounding box in PDF points.

BBox(x0: Double, y0: Double, x1: Double, y1: Double)

Accessor	Type	Description
`x0()`, `y0()`, `x1()`, `y1()`	`Double`	Corner coordinates
`width()`	`Double`	`x1 - x0`
`height()`	`Double`	`y1 - y0`

Color

8-bit RGBA color with named constants Color.BLACK, Color.WHITE, Color.TRANSPARENT.

Color(r: Int, g: Int, b: Int, a: Int)
Color(r: Int, g: Int, b: Int)            // a = 255

Accessors: r(): Int, g(): Int, b(): Int, a(): Int.

Point

Point(x: Double, y: Double)

Accessors: x(): Double, y(): Double.

Rect

Position-plus-size rectangle.

Rect(x: Double, y: Double, width: Double, height: Double)

Accessors: x(), y(), width(), height() (all Double), and toBBox(): BBox.

Text Types

TextChar

A single extracted character.

TextChar(codepoint: Int, bbox: BBox, confidence: Float)

Accessors: codepoint(): Int, bbox(): BBox, confidence(): Float, asString(): String.

TextWord

TextWord(text: String, bbox: BBox, confidence: Float)

Accessors: text(): String, bbox(): BBox, confidence(): Float.

TextLine

TextLine(text: String, bbox: BBox, words: List<TextWord>)

Accessors: text(): String, bbox(): BBox, words(): List<TextWord>.

TextSpan

A run of identically-styled text.

TextSpan(text: String, bbox: BBox, style: TextStyle)

Accessors: text(): String, bbox(): BBox, style(): TextStyle.

TextStyle

TextStyle(font: String?, size: Double, color: Color, bold: Boolean, italic: Boolean)

Accessors: font(): String?, size(): Double, color(): Color, bold(): Boolean, italic(): Boolean.

Table Types

Table

Table(bbox: BBox, rows: Int, cols: Int, cells: List<TableCell>)

Accessors: bbox(): BBox, rows(): Int, cols(): Int, cells(): List<TableCell>.

TableCell

TableCell(text: String, bbox: BBox, row: Int, col: Int, rowSpan: Int, colSpan: Int)

Accessors: text(): String, bbox(): BBox, row(): Int, col(): Int, rowSpan(): Int, colSpan(): Int.

Search Types

SearchMatch

SearchMatch(pageIndex: Int, bbox: BBox, text: String)

Accessors: pageIndex(): Int, bbox(): BBox, text(): String.

SearchResult

SearchResult(query: String, matches: List<SearchMatch>)

Accessors: query(): String, matches(): List<SearchMatch>, count(): Int, isEmpty(): Boolean.

SearchOptions

Immutable options built via a fluent builder. SearchOptions.DEFAULT is the default instance. Note: not currently wired to PdfDocument.search() — use the caseInsensitive/regex/maxResults overload above instead.

SearchOptions.builder()
    .withCaseSensitive(true)
    .withWholeWord(true)
    .withRegex(false)
    .withMaxResults(50)
    .build()

Accessors: caseSensitive(): Boolean, wholeWord(): Boolean, regex(): Boolean, maxResults(): Optional<Int>. Builder methods: withCaseSensitive(Boolean), withWholeWord(Boolean), withRegex(Boolean), withMaxResults(Int) / withMaxResults(Int?), build().

Form Types

FormField

FormField(name: String, type: FormFieldType, value: String?, bbox: BBox?, pageIndex: Int)

Accessors: name(): String, type(): FormFieldType, value(): Optional<String>, bbox(): Optional<BBox>, pageIndex(): Int. Use valueOrNull() / bboxOrNull() for null-based access.

Annotation Types

Annotation

Annotation(type: AnnotationType, pageIndex: Int, bbox: BBox, contents: String?, uri: String?)

Accessors: type(): AnnotationType, pageIndex(): Int, bbox(): BBox, contents(): Optional<String>, uri(): Optional<String>. Use contentsOrNull() / uriOrNull() for null-based access.

Image Types

ExtractedImage

ExtractedImage(bytes: ByteArray, format: ImageFormat, bbox: BBox, width: Int, height: Int)

Accessors: bytes(): ByteArray, format(): ImageFormat, bbox(): BBox, width(): Int, height(): Int.

Auto-Extraction Types

AutoResult

Result of an adaptive extraction.

result.text(): String
result.markdown(): Optional<String>
result.html(): Optional<String>
result.reason(): ExtractReason
result.confidence(): Double
result.ocrUsed(): Boolean
result.regions(): List<RegionResult>
result.pagesNeedingOcr(): List<Int>

Use markdownOrNull() / htmlOrNull() for null-based access to the rendered output.

RegionResult

Per-region extraction detail within an AutoResult.

region.pageIndex(): Int
region.bbox(): BBox
region.text(): String
region.reason(): ExtractReason
region.confidence(): Double
region.ocrUsed(): Boolean
region.table(): Optional<Table>

ClassifyResult

result.pages(): List<PageClass>
result.pagesNeedingOcr(): List<Int>
result.pagesWithChart(): List<Int>
result.pagesEncrypted(): List<Int>

AutoExtractConfig

Immutable configuration built via a fluent builder; AutoExtractConfig.DEFAULT is the default. Convert an existing config back to a builder with toBuilder().

AutoExtractConfig.builder()
    .withMode(ExtractMode.AUTO)
    .withForceOcrPages(listOf(2, 5))
    .withMinOcrConfidence(0.6)
    .withOcrLanguages("eng", "deu")
    .withPasswords("secret")
    .withTopMarginFraction(0.05)
    .withBottomMarginFraction(0.05)
    .withAllowSingleColumnTables(true)
    .withOcrInlineImages(false)
    .withCancelToken("token-id")
    .build()

Accessors return Optional<...> for each field: mode(), forceOcrPages(), minOcrConfidence(), ocrLanguages(), passwords(), topMarginFraction(), bottomMarginFraction(), allowSingleColumnTables(), ocrInlineImages(), cancelToken(). Builder setters accept both boxed-nullable and primitive overloads (e.g. withMinOcrConfidence(Double?) and withTopMarginFraction(double)), plus the withOcrLanguages(vararg String) / withPasswords(vararg String) varargs forms.

Compliance Types

ValidationResult

ValidationResult(valid: Boolean, violations: List<ValidationViolation>)

Accessors: valid(): Boolean, violations(): List<ValidationViolation>.

ValidationViolation

ValidationViolation(ruleId: String, description: String, pageIndex: Int?)

Accessors: ruleId(): String, description(): String, pageIndex(): Optional<Int>. Use pageIndexOrNull() for null-based access.

Metadata Types

DocumentInfo

DocumentInfo(/* title, author, subject, keywords, creator, producer, creationDate, modificationDate */)

Accessors all return Optional<String>: title(), author(), subject(), keywords(), creator(), producer(), creationDate(), modificationDate().

XmpMetadata

Raw XMP packet. XmpMetadata.EMPTY is the empty instance.

XmpMetadata(xml: String)

Accessors: xml(): String, isEmpty(): Boolean.

Security & Redaction Types

SecurityPolicy

Immutable policy built via a fluent builder.

SecurityPolicy.builder()
    .withMode(PolicyMode.STRICT)
    .allow("algorithm-id")
    .deny("algorithm-id")
    .build()

Accessors: mode(): PolicyMode, additionalAllow(): List<String>, additionalDeny(): List<String>. Builder methods: withMode(PolicyMode), allow(String), deny(String), build().

RedactResult

RedactResult(regionsApplied: Int, oracleVerified: Boolean)

Accessors: regionsApplied(): Int, oracleVerified(): Boolean.

Signature Types

SignOptions

Immutable signing options built via a fluent builder.

SignOptions.builder()
    .withLevel(SignatureLevel.B_T)
    .withReason("Approved")
    .withLocation("HQ")
    .withContactInfo("ops@example.com")
    .withTsaUrl("https://freetsa.org/tsr")
    .build()

Accessors: level(): SignatureLevel, reason(): Optional<String>, location(): Optional<String>, contactInfo(): Optional<String>, tsaUrl(): Optional<String>. Builder methods: withLevel, withReason, withLocation, withContactInfo, withTsaUrl, build().

Split Types

BookmarkSegment

BookmarkSegment(title: String, firstPage: Int, lastPage: Int, filename: String)

Accessors: title(): String, firstPage(): Int, lastPage(): Int, filename(): String.

SplitByBookmarksOptions

Immutable options built via a fluent builder.

SplitByBookmarksOptions.builder()
    .withLevel(1)
    .withFilenamePrefix("chapter-")
    .build()

Accessors: level(): Int, filenamePrefix(): Optional<String>. Builder methods: withLevel(Int), withFilenamePrefix(String?), build().

Enums

Enum	Values
`FormFieldType`	`TEXT`, `CHECKBOX`, `RADIO`, `CHOICE`
`AnnotationType`	`HIGHLIGHT`, `TEXT`, `LINK`, `STAMP`, `UNDERLINE`, `STRIKEOUT`, `SQUIGGLY`, `FREE_TEXT`, `LINE`, `SQUARE`, `CIRCLE`, `FILE_ATTACHMENT`
`ImageFormat`	`JPEG`, `PNG`, `CCITT`, `RAW`
`ExtractMode`	`TEXT_ONLY`, `AUTO`
`ExtractReason`	`OK`, `SCANNED_NO_TEXT_LAYER`, `GLYPH_MAPPING_MISSING`, `ENCRYPTED_NO_EXTRACT_PERMISSION`, `IMAGE_TABLE_NO_STRUCTURE`, `CHART_NOT_TRANSCRIBED`, `OCR_REQUESTED_BUT_UNAVAILABLE`, `OCR_LOW_CONFIDENCE`, `EMPTY`
`PageClass`	`TEXT_LAYER`, `SCANNED`, `MIXED`
`PixelFormat`	`RGBA_8888`, `RGB_888`, `GRAY_8`, `PNG`
`PolicyMode`	`COMPAT`, `STRICT`
`SignatureLevel`	`B_B`, `B_T`, `B_LT`
`PdfALevel`	`A_1B`, `A_1A`, `A_2B`, `A_2A`, `A_2U`, `A_3B`, `A_3A`, `A_3U`, `A_4`, `A_4E`, `A_4F`
`PdfXLevel`	`X_1A_2001`, `X_1A_2003`, `X_3_2002`, `X_3_2003`, `X_4`, `X_4P`, `X_5G`, `X_5N`, `X_5PG`, `X_6`, `X_6P`, `X_6N`
`PdfUaLevel`	`UA_1`, `UA_2` (each exposes `code(): Int`)
`PdfErrorKind`	`PARSE`, `ENCRYPTED`, `PERMISSION`, `IO`, `OCR_UNAVAILABLE`, `SIGNATURE`, `INVALID_STATE`, `UNSUPPORTED`, `OTHER`

Exceptions

All failures raise PdfException (an unchecked exception) or one of its kind-specific subclasses. The kind() accessor returns a PdfErrorKind.

import fyi.oxide.pdf.exception.PdfException

try {
    PdfDocument.open(bytes).use { doc ->
        println(doc.extractText(0))
    }
} catch (e: PdfException) {
    println("PDF error [${e.kind()}]: ${e.message}")
}

PdfException(message: String)
PdfException(kind: PdfErrorKind, message: String)
PdfException(kind: PdfErrorKind, message: String, cause: Throwable)

e.kind(): PdfErrorKind

Exception	Cause
`PdfParseException`	Malformed or corrupt PDF
`PdfEncryptedException`	Encrypted document opened without a valid password
`PdfPermissionException`	Operation blocked by document permissions
`PdfIoException`	Underlying I/O failure
`PdfOcrUnavailableException`	OCR requested but the `ocr` feature is not built in
`PdfSignatureException`	Signing or signature-verification failure
`PdfInvalidStateException`	Operation invalid for the current handle state
`PdfUnsupportedException`	Unsupported feature or format

Complete Example

import fyi.oxide.pdf.AutoExtractor
import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.geometry.BBox
import fyi.oxide.pdf.producerOrNull

// --- Creation ---
val bytes = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.").use { it.save() }

// --- Extraction ---
PdfDocument.open(bytes).use { doc ->
    println("Pages: ${doc.pageCount()}")
    println("Producer: ${doc.producerOrNull() ?: "(none)"}")

    val page = doc.page(0)
    println("Words: ${page.words().map { it.text() }}")
    println("Tables: ${page.tables().size}")

    // Search with options
    val matches = doc.search("Report", caseInsensitive = true, regex = false, maxResults = 0)
    matches.forEach { m -> println("p${m.pageIndex()} '${m.text()}' @ ${m.bbox()}") }

    // Adaptive extraction
    val result = AutoExtractor.balanced(doc).extractDocument()
    println("confidence=${result.confidence()} ocr=${result.ocrUsed()}")
}

// --- Editing: redact + fill forms ---
DocumentEditor.open(bytes).use { editor ->
    editor.setFormField("name", "Jane Doe")
        .addRedaction(0, BBox(72.0, 700.0, 272.0, 720.0))
        .scrubMetadata()
    val redaction = editor.applyRedactionsDestructive()
    println("Redacted ${redaction.regionsApplied()} regions")
    val out: ByteArray = editor.save()
}

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.

Next Steps

Types & Enums — all shared types and enums
Page API Reference — consistent per-page iteration across bindings
Getting Started with Kotlin — tutorial