Skip to content

Kotlin API Reference

PDF Oxide ships idiomatic Kotlin/JVM bindings (Android-ready) as a thin facade over the mature fyi.oxide:pdf-oxide Java binding, which owns the single JNI native bridge (the pdf_oxide_jni crate). The Kotlin module adds zero native code: it re-exports the Java types (PdfDocument, Pdf, PdfPage, DocumentEditor, PdfSigner, PdfValidator, AutoExtractor, and the geometry / text / table / search value types) and layers Kotlin sugar — Optional<T> to T? extensions and use { } on the AutoCloseable handles.

// build.gradle.kts
dependencies {
    implementation("fyi.oxide:pdf-oxide-kotlin:0.3.69")
}

The JNI native library (libpdf_oxide_jni) is not bundled — load it via System.loadLibrary("pdf_oxide_jni") (ship the .so/.dylib on your java.library.path, or in jniLibs/<abi>/ on Android), or point the Java NativeLoader at it with -Dfyi.oxide.pdf.lib.path=<path>.

For the Java API, see the Java API Reference. For the Rust API, see the Rust API Reference. For type details, see Types & Enums.

import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull

Pdf.fromMarkdown("# Hello\n\nbody\n").use { pdf ->
    PdfDocument.open(pdf.save()).use { doc ->
        println(doc.pageCount())
        println(doc.extractText(0))
        println(doc.toMarkdown())
        println(doc.page(0).words().map { it.text() })
        println(doc.producerOrNull() ?: "(no producer)")   // Optional -> nullable
    }
}

All handles (PdfDocument, Pdf, DocumentEditor) implement AutoCloseable, so the Kotlin use { } block closes native memory deterministically. Errors raise PdfException (and its subclasses); see Exceptions.


PdfDocument

The primary read-only entry point to a PDF — open, extract, convert, render, search, and inspect form fields. Instances own native memory and must be closed; use use { }.

import fyi.oxide.pdf.PdfDocument

Factory Methods

PdfDocument.open(path: Path): PdfDocument

Open a PDF from a filesystem path.

PdfDocument.open(path: String): PdfDocument

Open a PDF from a path string.

PdfDocument.open(bytes: ByteArray): PdfDocument

Open a PDF from in-memory bytes (e.g. downloaded from S3 or received over HTTP).

PdfDocument.open(path: Path, password: String): PdfDocument

Open an encrypted PDF from a path with the user or owner password.

PdfDocument.open(path: String, password: String): PdfDocument

Open an encrypted PDF from a path string with a password.

PdfDocument.open(bytes: ByteArray, password: String): PdfDocument

Open an encrypted PDF from bytes with a password.

PdfDocument.open(stream: InputStream): PdfDocument

Open a PDF by reading all bytes from an InputStream.

Static One-Shots

PdfDocument.extractText(path: String): String
PdfDocument.extractText(path: Path): String

Open, extract all text, and close in a single call — for the simple case where you do not need a live handle.

Authentication

doc.authenticate(password: String): Boolean
doc.authenticate(password: ByteArray): Boolean

Authenticate an encrypted document after opening. Returns true if the password matched.

Document Info

doc.pageCount(): Int

Number of pages in the document.

doc.producer(): Optional<String>
doc.creator(): Optional<String>

Document /Producer and /Creator metadata. Use the Kotlin producerOrNull() / creatorOrNull() extensions for null-based access.

val doc.isOpen: Boolean

Whether the native handle is still open (Kotlin property over the Java isOpen() getter).

Text Extraction

doc.extractText(pageIndex: Int): String

Extract plain text from a single zero-indexed page.

doc.extractTextAuto(pageIndex: Int): String

Extract text with automatic strategy selection (falls back to OCR for scanned pages when the OCR feature is available).

doc.extractStructured(page: Int): String

Extract a structured (JSON) representation of the page’s text and layout.

Conversion

doc.toMarkdown(): String
doc.toMarkdown(pageIndex: Int): String

Convert the whole document or a single page to Markdown.

doc.toHtml(): String
doc.toHtml(pageIndex: Int): String

Convert the whole document or a single page to HTML.

doc.search(query: String): List<SearchMatch>

Search the document for a literal string. Returns per-page matches with bounding boxes.

doc.search(query: String, caseInsensitive: Boolean, regex: Boolean, maxResults: Int): List<SearchMatch>

Search with case sensitivity, regex, and a result cap (maxResults = 0 means no cap).

Forms

doc.formFields(): List<FormField>

Get all AcroForm fields with their type, value, widget bounds, and page index. See FormField.

Rendering

doc.render(pageIndex: Int): ByteArray
doc.render(pageIndex: Int, dpi: Int): ByteArray

Render a page to PNG image bytes at the default DPI or a specified DPI.

Page Access

doc.page(index: Int): PdfPage

Get a lazy PdfPage handle for the given zero-based index.

doc.pages(): List<PdfPage>

Get all pages as a list.

doc.pagesStream(): Stream<PdfPage>

Get all pages as a Java Stream for fluent processing.

Lifecycle

doc.close()

Free native memory. Idempotent — a second call is a no-op. Prefer use { }.


PdfPage

A lazy page handle returned by PdfDocument.page(), pages(), or pagesStream(). All accessors dispatch to the parent document on access.

PdfDocument.open(bytes).use { doc ->
    val page = doc.page(0)
    val words = page.words()
    val tables = page.tables()
}

Geometry

page.parent(): PdfDocument
page.index(): Int
page.mediaBox(): BBox
page.cropBox(): BBox
page.width(): Double
page.height(): Double
page.rotation(): Int

Parent document, zero-based index, MediaBox / CropBox rectangles, dimensions in PDF points, and page rotation in degrees.

Content Extraction

page.text(): String

Extract all text on the page.

page.text(region: BBox): String

Extract text within a bounding-box region.

page.words(): List<TextWord>
page.lines(): List<TextLine>
page.chars(): List<TextChar>

Structured text at word, line, and character granularity.

page.images(): List<ExtractedImage>
page.tables(): List<Table>
page.annotations(): List<Annotation>

Extracted images, detected tables, and page annotations.


Pdf

Create PDFs from source formats, split by bookmarks, and serialize. Implements AutoCloseable.

import fyi.oxide.pdf.Pdf

Factory Methods

Pdf.fromMarkdown(markdown: String): Pdf

Create a PDF from Markdown content.

Pdf.fromHtml(html: String): Pdf

Create a PDF from HTML content.

Pdf.fromImages(images: List<ByteArray>): Pdf

Create a multi-page PDF from a list of image byte arrays, one page per image.

Splitting

pdf.planSplitByBookmarks(opts: SplitByBookmarksOptions): List<BookmarkSegment>

Plan a split by outline bookmarks without producing output — returns the segments (title, page range, filename) that would be created.

pdf.splitByBookmarks(opts: SplitByBookmarksOptions): List<ByteArray>

Split into multiple PDFs by bookmark level. Returns one byte array per segment.

Pdf.planSplitByBookmarksCount(sourcePdf: ByteArray, level: Int): Int

Static helper: count how many segments a bookmark split at the given level would produce.

Pdf.splitByBookmarksFromBytes(sourcePdf: ByteArray, level: Int): Array<ByteArray>

Static helper: split source PDF bytes by bookmark level directly.

Saving

pdf.save(): ByteArray

Serialize the PDF to bytes.

pdf.saveTo(out: Path)

Write the PDF to a file.

val pdf.isOpen: Boolean
pdf.close()

Lifecycle (Kotlin isOpen property and close()). Prefer use { }.


DocumentEditor

Mutating editor for redaction, form filling, metadata scrubbing, and incremental saves. Implements AutoCloseable. Setter methods return this for fluent chaining.

import fyi.oxide.pdf.DocumentEditor

Factory Methods

DocumentEditor.open(path: Path): DocumentEditor
DocumentEditor.open(path: String): DocumentEditor
DocumentEditor.open(bytes: ByteArray): DocumentEditor

Open a document for editing from a path or in-memory bytes.

Form Filling

editor.setFormField(name: String, value: String): DocumentEditor

Set a text/choice field value by fully qualified name.

editor.setFormField(name: String, checked: Boolean): DocumentEditor

Set a checkbox/radio field state by name.

Redaction

editor.addRedaction(pageIndex: Int, region: BBox): DocumentEditor

Queue a redaction over a rectangular region on a page.

editor.redactionCount(pageIndex: Int): Int
editor.redactionCount(): Int

Number of queued redactions on a page, or across the whole document.

editor.applyRedactionsDestructive(): RedactResult

Permanently apply all queued redactions, removing underlying content. Returns a RedactResult with the count applied and oracle-verification status.

Metadata

editor.scrubMetadata(): DocumentEditor

Strip document metadata (Info dictionary, XMP) for privacy.

Saving

editor.save(): ByteArray
editor.saveTo(out: Path)

Serialize the edited document with a full rewrite.

editor.saveIncremental(): ByteArray
editor.saveIncrementalTo(out: Path)

Serialize using an incremental update (appends changes, preserving the original bytes).

val editor.isOpen: Boolean
editor.close()

Lifecycle. Prefer use { }.


AutoExtractor

Adaptive extraction pipeline that classifies pages (text-layer vs. scanned), applies OCR where needed, and emits text / Markdown / HTML with confidence scores.

import fyi.oxide.pdf.AutoExtractor

Factory Methods

AutoExtractor.of(doc: PdfDocument): AutoExtractor
AutoExtractor.of(doc: PdfDocument, config: AutoExtractConfig): AutoExtractor

Create an extractor over a document, optionally with a custom AutoExtractConfig.

AutoExtractor.fast(doc: PdfDocument): AutoExtractor
AutoExtractor.balanced(doc: PdfDocument): AutoExtractor
AutoExtractor.highFidelity(doc: PdfDocument): AutoExtractor

Preset configurations trading speed for fidelity.

Extraction

extractor.extractText(): String
extractor.extractTextForPage(pageIndex: Int): String

Plain-text extraction for the whole document or a single page.

extractor.extractDocument(): AutoResult
extractor.extractPage(pageIndex: Int): AutoResult

Full adaptive extraction returning an AutoResult (text, optional Markdown/HTML, reason, confidence, OCR flag, regions).

extractor.extractAutoDocument(): AutoResult
extractor.extractAutoPage(pageIndex: Int): AutoResult

Auto-mode variants of the document- and page-level extraction.

extractor.extractDocumentJson(): String
extractor.extractPageJson(pageIndex: Int): String

Extraction serialized as a JSON string.

Classification

extractor.classifyDocument(): ClassifyResult
extractor.classifyPage(pageIndex: Int): ClassifyResult

Classify the document or a page, returning a ClassifyResult (per-page class plus lists of pages needing OCR, containing charts, or encrypted).

extractor.classifyPageKind(pageIndex: Int): PageClass
extractor.classifyDocumentKinds(): List<PageClass>

Get the PageClass (TEXT_LAYER / SCANNED / MIXED) for a page or all pages.

Accessors

extractor.document(): PdfDocument
extractor.config(): AutoExtractConfig

The wrapped document and the active configuration.


MarkdownConverter

Stateless, thread-safe converter from PdfDocument to Markdown or HTML.

import fyi.oxide.pdf.MarkdownConverter

MarkdownConverter.toMarkdown(doc: PdfDocument): String
MarkdownConverter.toMarkdown(doc: PdfDocument, pageIndex: Int): String
MarkdownConverter.toHtml(doc: PdfDocument): String
MarkdownConverter.toHtml(doc: PdfDocument, pageIndex: Int): String

Convert the whole document or a single page to Markdown / HTML.


PdfSigner

Digitally sign and verify PDFs with PKCS#12 keystores (PAdES B-B / B-T / B-LT levels).

import fyi.oxide.pdf.PdfSigner
PdfSigner.fromPkcs12(keystore: Path, password: String): PdfSigner
PdfSigner.fromPkcs12(keystoreBytes: ByteArray, password: String): PdfSigner

Load a signer from a PKCS#12 keystore on disk or in memory.

signer.sign(pdf: ByteArray, opts: SignOptions): ByteArray

Sign PDF bytes with the given SignOptions (level, reason, location, contact, TSA URL). Returns the signed PDF.

signer.verify(pdf: ByteArray): Boolean

Verify all signatures in a PDF. Returns true if every signature is cryptographically valid.

PdfSigner.classifyLevel(pdf: ByteArray): SignatureLevel

Static helper: detect the PAdES conformance level of an existing signed PDF.


PdfValidator

Stateless, thread-safe validation against PDF/A, PDF/X, and PDF/UA conformance levels.

import fyi.oxide.pdf.PdfValidator

PdfValidator.isPdfA(doc: PdfDocument, level: PdfALevel): Boolean
PdfValidator.isPdfUa(doc: PdfDocument, level: PdfUaLevel): Boolean

Quick boolean conformance checks.

PdfValidator.validatePdfA(doc: PdfDocument, level: PdfALevel): ValidationResult
PdfValidator.validatePdfX(doc: PdfDocument, level: PdfXLevel): ValidationResult
PdfValidator.validatePdfUa(doc: PdfDocument, level: PdfUaLevel): ValidationResult

Full validation returning a ValidationResult with the list of violations.


PdfPolicy

Global security-policy controls governing which cryptographic algorithms are permitted.

import fyi.oxide.pdf.PdfPolicy

PdfPolicy.current(): PolicyMode
PdfPolicy.set(mode: PolicyMode)
PdfPolicy.compat(): PolicyMode
PdfPolicy.strict(): PolicyMode
PdfPolicy.fipsStrict(): PolicyMode

Read or set the active PolicyMode, and obtain the built-in compat / strict / FIPS-strict modes.


Kotlin Extensions

The Kotlin facade’s only added surface: Optional<T> to T? converters and the generic orNull() helper. Import from fyi.oxide.pdf.

fun <T : Any> Optional<T>.orNull(): T?

Generic: empty Optional becomes null.

fun PdfDocument.producerOrNull(): String?
fun PdfDocument.creatorOrNull(): String?

Document /Producer and /Creator, or null if absent.

fun FormField.valueOrNull(): String?
fun FormField.bboxOrNull(): BBox?

Form-field value and widget bounding box, or null.

fun Annotation.contentsOrNull(): String?
fun Annotation.uriOrNull(): String?

Annotation /Contents and link target URI, or null.

fun AutoResult.markdownOrNull(): String?
fun AutoResult.htmlOrNull(): String?

Markdown / HTML rendering of an auto-extraction, or null if not produced.

fun ValidationViolation.pageIndexOrNull(): Int?

Page index a violation applies to, or null for document-level rules.


Geometry Types

BBox

Axis-aligned bounding box in PDF points.

BBox(x0: Double, y0: Double, x1: Double, y1: Double)
Accessor Type Description
x0(), y0(), x1(), y1() Double Corner coordinates
width() Double x1 - x0
height() Double y1 - y0

Color

8-bit RGBA color with named constants Color.BLACK, Color.WHITE, Color.TRANSPARENT.

Color(r: Int, g: Int, b: Int, a: Int)
Color(r: Int, g: Int, b: Int)            // a = 255

Accessors: r(): Int, g(): Int, b(): Int, a(): Int.

Point

Point(x: Double, y: Double)

Accessors: x(): Double, y(): Double.

Rect

Position-plus-size rectangle.

Rect(x: Double, y: Double, width: Double, height: Double)

Accessors: x(), y(), width(), height() (all Double), and toBBox(): BBox.


Text Types

TextChar

A single extracted character.

TextChar(codepoint: Int, bbox: BBox, confidence: Float)

Accessors: codepoint(): Int, bbox(): BBox, confidence(): Float, asString(): String.

TextWord

TextWord(text: String, bbox: BBox, confidence: Float)

Accessors: text(): String, bbox(): BBox, confidence(): Float.

TextLine

TextLine(text: String, bbox: BBox, words: List<TextWord>)

Accessors: text(): String, bbox(): BBox, words(): List<TextWord>.

TextSpan

A run of identically-styled text.

TextSpan(text: String, bbox: BBox, style: TextStyle)

Accessors: text(): String, bbox(): BBox, style(): TextStyle.

TextStyle

TextStyle(font: String?, size: Double, color: Color, bold: Boolean, italic: Boolean)

Accessors: font(): String?, size(): Double, color(): Color, bold(): Boolean, italic(): Boolean.


Table Types

Table

Table(bbox: BBox, rows: Int, cols: Int, cells: List<TableCell>)

Accessors: bbox(): BBox, rows(): Int, cols(): Int, cells(): List<TableCell>.

TableCell

TableCell(text: String, bbox: BBox, row: Int, col: Int, rowSpan: Int, colSpan: Int)

Accessors: text(): String, bbox(): BBox, row(): Int, col(): Int, rowSpan(): Int, colSpan(): Int.


Search Types

SearchMatch

SearchMatch(pageIndex: Int, bbox: BBox, text: String)

Accessors: pageIndex(): Int, bbox(): BBox, text(): String.

SearchResult

SearchResult(query: String, matches: List<SearchMatch>)

Accessors: query(): String, matches(): List<SearchMatch>, count(): Int, isEmpty(): Boolean.

SearchOptions

Immutable options built via a fluent builder. SearchOptions.DEFAULT is the default instance. Note: not currently wired to PdfDocument.search() — use the caseInsensitive/regex/maxResults overload above instead.

SearchOptions.builder()
    .withCaseSensitive(true)
    .withWholeWord(true)
    .withRegex(false)
    .withMaxResults(50)
    .build()

Accessors: caseSensitive(): Boolean, wholeWord(): Boolean, regex(): Boolean, maxResults(): Optional<Int>. Builder methods: withCaseSensitive(Boolean), withWholeWord(Boolean), withRegex(Boolean), withMaxResults(Int) / withMaxResults(Int?), build().


Form Types

FormField

FormField(name: String, type: FormFieldType, value: String?, bbox: BBox?, pageIndex: Int)

Accessors: name(): String, type(): FormFieldType, value(): Optional<String>, bbox(): Optional<BBox>, pageIndex(): Int. Use valueOrNull() / bboxOrNull() for null-based access.


Annotation Types

Annotation

Annotation(type: AnnotationType, pageIndex: Int, bbox: BBox, contents: String?, uri: String?)

Accessors: type(): AnnotationType, pageIndex(): Int, bbox(): BBox, contents(): Optional<String>, uri(): Optional<String>. Use contentsOrNull() / uriOrNull() for null-based access.


Image Types

ExtractedImage

ExtractedImage(bytes: ByteArray, format: ImageFormat, bbox: BBox, width: Int, height: Int)

Accessors: bytes(): ByteArray, format(): ImageFormat, bbox(): BBox, width(): Int, height(): Int.


Auto-Extraction Types

AutoResult

Result of an adaptive extraction.

result.text(): String
result.markdown(): Optional<String>
result.html(): Optional<String>
result.reason(): ExtractReason
result.confidence(): Double
result.ocrUsed(): Boolean
result.regions(): List<RegionResult>
result.pagesNeedingOcr(): List<Int>

Use markdownOrNull() / htmlOrNull() for null-based access to the rendered output.

RegionResult

Per-region extraction detail within an AutoResult.

region.pageIndex(): Int
region.bbox(): BBox
region.text(): String
region.reason(): ExtractReason
region.confidence(): Double
region.ocrUsed(): Boolean
region.table(): Optional<Table>

ClassifyResult

result.pages(): List<PageClass>
result.pagesNeedingOcr(): List<Int>
result.pagesWithChart(): List<Int>
result.pagesEncrypted(): List<Int>

AutoExtractConfig

Immutable configuration built via a fluent builder; AutoExtractConfig.DEFAULT is the default. Convert an existing config back to a builder with toBuilder().

AutoExtractConfig.builder()
    .withMode(ExtractMode.AUTO)
    .withForceOcrPages(listOf(2, 5))
    .withMinOcrConfidence(0.6)
    .withOcrLanguages("eng", "deu")
    .withPasswords("secret")
    .withTopMarginFraction(0.05)
    .withBottomMarginFraction(0.05)
    .withAllowSingleColumnTables(true)
    .withOcrInlineImages(false)
    .withCancelToken("token-id")
    .build()

Accessors return Optional<...> for each field: mode(), forceOcrPages(), minOcrConfidence(), ocrLanguages(), passwords(), topMarginFraction(), bottomMarginFraction(), allowSingleColumnTables(), ocrInlineImages(), cancelToken(). Builder setters accept both boxed-nullable and primitive overloads (e.g. withMinOcrConfidence(Double?) and withTopMarginFraction(double)), plus the withOcrLanguages(vararg String) / withPasswords(vararg String) varargs forms.


Compliance Types

ValidationResult

ValidationResult(valid: Boolean, violations: List<ValidationViolation>)

Accessors: valid(): Boolean, violations(): List<ValidationViolation>.

ValidationViolation

ValidationViolation(ruleId: String, description: String, pageIndex: Int?)

Accessors: ruleId(): String, description(): String, pageIndex(): Optional<Int>. Use pageIndexOrNull() for null-based access.


Metadata Types

DocumentInfo

DocumentInfo(/* title, author, subject, keywords, creator, producer, creationDate, modificationDate */)

Accessors all return Optional<String>: title(), author(), subject(), keywords(), creator(), producer(), creationDate(), modificationDate().

XmpMetadata

Raw XMP packet. XmpMetadata.EMPTY is the empty instance.

XmpMetadata(xml: String)

Accessors: xml(): String, isEmpty(): Boolean.


Security & Redaction Types

SecurityPolicy

Immutable policy built via a fluent builder.

SecurityPolicy.builder()
    .withMode(PolicyMode.STRICT)
    .allow("algorithm-id")
    .deny("algorithm-id")
    .build()

Accessors: mode(): PolicyMode, additionalAllow(): List<String>, additionalDeny(): List<String>. Builder methods: withMode(PolicyMode), allow(String), deny(String), build().

RedactResult

RedactResult(regionsApplied: Int, oracleVerified: Boolean)

Accessors: regionsApplied(): Int, oracleVerified(): Boolean.


Signature Types

SignOptions

Immutable signing options built via a fluent builder.

SignOptions.builder()
    .withLevel(SignatureLevel.B_T)
    .withReason("Approved")
    .withLocation("HQ")
    .withContactInfo("ops@example.com")
    .withTsaUrl("https://freetsa.org/tsr")
    .build()

Accessors: level(): SignatureLevel, reason(): Optional<String>, location(): Optional<String>, contactInfo(): Optional<String>, tsaUrl(): Optional<String>. Builder methods: withLevel, withReason, withLocation, withContactInfo, withTsaUrl, build().


Split Types

BookmarkSegment

BookmarkSegment(title: String, firstPage: Int, lastPage: Int, filename: String)

Accessors: title(): String, firstPage(): Int, lastPage(): Int, filename(): String.

SplitByBookmarksOptions

Immutable options built via a fluent builder.

SplitByBookmarksOptions.builder()
    .withLevel(1)
    .withFilenamePrefix("chapter-")
    .build()

Accessors: level(): Int, filenamePrefix(): Optional<String>. Builder methods: withLevel(Int), withFilenamePrefix(String?), build().


Enums

Enum Values
FormFieldType TEXT, CHECKBOX, RADIO, CHOICE
AnnotationType HIGHLIGHT, TEXT, LINK, STAMP, UNDERLINE, STRIKEOUT, SQUIGGLY, FREE_TEXT, LINE, SQUARE, CIRCLE, FILE_ATTACHMENT
ImageFormat JPEG, PNG, CCITT, RAW
ExtractMode TEXT_ONLY, AUTO
ExtractReason OK, SCANNED_NO_TEXT_LAYER, GLYPH_MAPPING_MISSING, ENCRYPTED_NO_EXTRACT_PERMISSION, IMAGE_TABLE_NO_STRUCTURE, CHART_NOT_TRANSCRIBED, OCR_REQUESTED_BUT_UNAVAILABLE, OCR_LOW_CONFIDENCE, EMPTY
PageClass TEXT_LAYER, SCANNED, MIXED
PixelFormat RGBA_8888, RGB_888, GRAY_8, PNG
PolicyMode COMPAT, STRICT
SignatureLevel B_B, B_T, B_LT
PdfALevel A_1B, A_1A, A_2B, A_2A, A_2U, A_3B, A_3A, A_3U, A_4, A_4E, A_4F
PdfXLevel X_1A_2001, X_1A_2003, X_3_2002, X_3_2003, X_4, X_4P, X_5G, X_5N, X_5PG, X_6, X_6P, X_6N
PdfUaLevel UA_1, UA_2 (each exposes code(): Int)
PdfErrorKind PARSE, ENCRYPTED, PERMISSION, IO, OCR_UNAVAILABLE, SIGNATURE, INVALID_STATE, UNSUPPORTED, OTHER

Exceptions

All failures raise PdfException (an unchecked exception) or one of its kind-specific subclasses. The kind() accessor returns a PdfErrorKind.

import fyi.oxide.pdf.exception.PdfException

try {
    PdfDocument.open(bytes).use { doc ->
        println(doc.extractText(0))
    }
} catch (e: PdfException) {
    println("PDF error [${e.kind()}]: ${e.message}")
}
PdfException(message: String)
PdfException(kind: PdfErrorKind, message: String)
PdfException(kind: PdfErrorKind, message: String, cause: Throwable)

e.kind(): PdfErrorKind
Exception Cause
PdfParseException Malformed or corrupt PDF
PdfEncryptedException Encrypted document opened without a valid password
PdfPermissionException Operation blocked by document permissions
PdfIoException Underlying I/O failure
PdfOcrUnavailableException OCR requested but the ocr feature is not built in
PdfSignatureException Signing or signature-verification failure
PdfInvalidStateException Operation invalid for the current handle state
PdfUnsupportedException Unsupported feature or format

Complete Example

import fyi.oxide.pdf.AutoExtractor
import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.geometry.BBox
import fyi.oxide.pdf.producerOrNull

// --- Creation ---
val bytes = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.").use { it.save() }

// --- Extraction ---
PdfDocument.open(bytes).use { doc ->
    println("Pages: ${doc.pageCount()}")
    println("Producer: ${doc.producerOrNull() ?: "(none)"}")

    val page = doc.page(0)
    println("Words: ${page.words().map { it.text() }}")
    println("Tables: ${page.tables().size}")

    // Search with options
    val matches = doc.search("Report", caseInsensitive = true, regex = false, maxResults = 0)
    matches.forEach { m -> println("p${m.pageIndex()} '${m.text()}' @ ${m.bbox()}") }

    // Adaptive extraction
    val result = AutoExtractor.balanced(doc).extractDocument()
    println("confidence=${result.confidence()} ocr=${result.ocrUsed()}")
}

// --- Editing: redact + fill forms ---
DocumentEditor.open(bytes).use { editor ->
    editor.setFormField("name", "Jane Doe")
        .addRedaction(0, BBox(72.0, 700.0, 272.0, 720.0))
        .scrubMetadata()
    val redaction = editor.applyRedactionsDestructive()
    println("Redacted ${redaction.regionsApplied()} regions")
    val out: ByteArray = editor.save()
}

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.

Next Steps