Getting Started with PDF Oxide (Kotlin)
PDF Oxide is the fastest PDF library for the JVM with built-in text extraction — 0.8ms mean, 100% pass rate on 3,830 PDFs. The Kotlin binding is an idiomatic, Android-ready facade over the Java binding: it adds use { } on the closable handles and turns Java Optional<T> returns into nullable T?. One library for extracting, creating, and editing PDFs. MIT licensed, built on a Rust core.
Installation
Add the Kotlin binding to your build.gradle.kts. It transitively pulls in the Java binding that owns the JNI native bridge:
dependencies {
implementation("fyi.oxide:pdf-oxide-kotlin:0.3.69")
}
Requirements: JDK 17+. On Android, ship the native libpdf_oxide_jni.so in jniLibs/<abi>/; on the desktop JVM the loader finds it automatically (override with -Dfyi.oxide.pdf.lib.path=<path> when needed).
Quick Start
Build a PDF from Markdown, open it, and read the text back. The Pdf and PdfDocument handles are AutoCloseable, so wrap them in use { }:
import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull
Pdf.fromMarkdown("# Hello pdf_oxide\n\nThis is a **Kotlin** binding.\n").use { pdf ->
PdfDocument.open(pdf.save()).use { doc ->
println("pages: ${doc.pageCount()}")
println("producer: ${doc.producerOrNull() ?: "(none)"}")
println(doc.extractText(0))
}
}
Pdf.fromMarkdown(String) returns a closable Pdf builder; pdf.save() serializes it to a ByteArray. PdfDocument.open(ByteArray) opens that for reading.
Opening a PDF
Open an existing document from bytes and inspect its metadata. producerOrNull() and creatorOrNull() are the Kotlin nullable views over the Java Optional getters:
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull
import fyi.oxide.pdf.creatorOrNull
PdfDocument.open(pdfBytes).use { doc ->
println("open: ${doc.isOpen}")
println("pages: ${doc.pageCount()}")
println("producer: ${doc.producerOrNull() ?: "(none)"}")
println("creator: ${doc.creatorOrNull() ?: "(none)"}")
}
Text Extraction
Extract plain text from any page by its zero-based index, or loop over every page:
import fyi.oxide.pdf.PdfDocument
PdfDocument.open(pdfBytes).use { doc ->
// a single page
println(doc.extractText(0))
// every page
for (i in 0 until doc.pageCount()) {
println("--- Page ${i + 1} ---")
println(doc.extractText(i))
}
}
Page Elements
doc.page(i) returns a PdfPage exposing structured geometry — words, lines, characters, tables, images, and annotations. Each word carries its text and a bounding box:
import fyi.oxide.pdf.PdfDocument
PdfDocument.open(pdfBytes).use { doc ->
val page = doc.page(0)
println("size: ${page.width()} x ${page.height()}")
page.words().take(8).forEach { word ->
println("${word.text()} @ ${word.bbox()}")
}
println("lines: ${page.lines().size}")
println("chars: ${page.chars().size}")
println("tables: ${page.tables().size}")
println("images: ${page.images().size}")
println("annotations: ${page.annotations().size}")
}
A word’s bbox() is a BBox with helpers like width() and height().
Markdown & HTML Conversion
Convert the whole document to Markdown, or render a page to HTML:
import fyi.oxide.pdf.PdfDocument
PdfDocument.open(pdfBytes).use { doc ->
val markdown = doc.toMarkdown() // all pages
println(markdown)
val html = doc.toHtml()
println(html)
}
Search
Search for text across the document. Each match exposes its text via text():
import fyi.oxide.pdf.PdfDocument
PdfDocument.open(pdfBytes).use { doc ->
val matches = doc.search("configuration")
matches.forEach { m ->
println("match: ${m.text()}")
}
}
Auto-Extraction
AutoExtractor runs the full extraction pipeline in one call and returns an AutoResult with the text plus optional Markdown/HTML renderings. The markdownOrNull() / htmlOrNull() extensions turn the Java Optional returns into nullable values:
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.AutoExtractor
import fyi.oxide.pdf.markdownOrNull
import fyi.oxide.pdf.htmlOrNull
PdfDocument.open(pdfBytes).use { doc ->
val result = AutoExtractor.of(doc).extractDocument()
println(result.text())
result.markdownOrNull()?.let { println(it) }
result.htmlOrNull()?.let { println(it) }
}
Editing
DocumentEditor opens a PDF for structural edits — for example, scrubbing metadata before sharing — then serializes the result back to bytes:
import fyi.oxide.pdf.DocumentEditor
DocumentEditor.open(pdfBytes).use { editor ->
editor.scrubMetadata()
val cleaned: ByteArray = editor.save()
println("cleaned: ${cleaned.size} bytes")
}
Next Steps
- Java Getting Started – the JVM binding the Kotlin facade wraps
- Python Getting Started – using PDF Oxide from Python
- Text Extraction – detailed extraction options and recipes
- PDF Creation – advanced creation with builders, encryption, and metadata
- Editing – modifying existing PDFs, annotations, and form fields