Skip to content

Getting Started with PDF Oxide (Kotlin)

PDF Oxide is the fastest PDF library for the JVM with built-in text extraction — 0.8ms mean, 100% pass rate on 3,830 PDFs. The Kotlin binding is an idiomatic, Android-ready facade over the Java binding: it adds use { } on the closable handles and turns Java Optional<T> returns into nullable T?. One library for extracting, creating, and editing PDFs. MIT licensed, built on a Rust core.

Installation

Add the Kotlin binding to your build.gradle.kts. It transitively pulls in the Java binding that owns the JNI native bridge:

dependencies {
    implementation("fyi.oxide:pdf-oxide-kotlin:0.3.69")
}

Requirements: JDK 17+. On Android, ship the native libpdf_oxide_jni.so in jniLibs/<abi>/; on the desktop JVM the loader finds it automatically (override with -Dfyi.oxide.pdf.lib.path=<path> when needed).

Quick Start

Build a PDF from Markdown, open it, and read the text back. The Pdf and PdfDocument handles are AutoCloseable, so wrap them in use { }:

import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull

Pdf.fromMarkdown("# Hello pdf_oxide\n\nThis is a **Kotlin** binding.\n").use { pdf ->
    PdfDocument.open(pdf.save()).use { doc ->
        println("pages:    ${doc.pageCount()}")
        println("producer: ${doc.producerOrNull() ?: "(none)"}")
        println(doc.extractText(0))
    }
}

Pdf.fromMarkdown(String) returns a closable Pdf builder; pdf.save() serializes it to a ByteArray. PdfDocument.open(ByteArray) opens that for reading.

Opening a PDF

Open an existing document from bytes and inspect its metadata. producerOrNull() and creatorOrNull() are the Kotlin nullable views over the Java Optional getters:

import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull
import fyi.oxide.pdf.creatorOrNull

PdfDocument.open(pdfBytes).use { doc ->
    println("open:     ${doc.isOpen}")
    println("pages:    ${doc.pageCount()}")
    println("producer: ${doc.producerOrNull() ?: "(none)"}")
    println("creator:  ${doc.creatorOrNull() ?: "(none)"}")
}

Text Extraction

Extract plain text from any page by its zero-based index, or loop over every page:

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(pdfBytes).use { doc ->
    // a single page
    println(doc.extractText(0))

    // every page
    for (i in 0 until doc.pageCount()) {
        println("--- Page ${i + 1} ---")
        println(doc.extractText(i))
    }
}

Page Elements

doc.page(i) returns a PdfPage exposing structured geometry — words, lines, characters, tables, images, and annotations. Each word carries its text and a bounding box:

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(pdfBytes).use { doc ->
    val page = doc.page(0)
    println("size: ${page.width()} x ${page.height()}")

    page.words().take(8).forEach { word ->
        println("${word.text()} @ ${word.bbox()}")
    }

    println("lines:       ${page.lines().size}")
    println("chars:       ${page.chars().size}")
    println("tables:      ${page.tables().size}")
    println("images:      ${page.images().size}")
    println("annotations: ${page.annotations().size}")
}

A word’s bbox() is a BBox with helpers like width() and height().

Markdown & HTML Conversion

Convert the whole document to Markdown, or render a page to HTML:

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(pdfBytes).use { doc ->
    val markdown = doc.toMarkdown()  // all pages
    println(markdown)

    val html = doc.toHtml()
    println(html)
}

Search for text across the document. Each match exposes its text via text():

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(pdfBytes).use { doc ->
    val matches = doc.search("configuration")
    matches.forEach { m ->
        println("match: ${m.text()}")
    }
}

Auto-Extraction

AutoExtractor runs the full extraction pipeline in one call and returns an AutoResult with the text plus optional Markdown/HTML renderings. The markdownOrNull() / htmlOrNull() extensions turn the Java Optional returns into nullable values:

import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.AutoExtractor
import fyi.oxide.pdf.markdownOrNull
import fyi.oxide.pdf.htmlOrNull

PdfDocument.open(pdfBytes).use { doc ->
    val result = AutoExtractor.of(doc).extractDocument()
    println(result.text())
    result.markdownOrNull()?.let { println(it) }
    result.htmlOrNull()?.let { println(it) }
}

Editing

DocumentEditor opens a PDF for structural edits — for example, scrubbing metadata before sharing — then serializes the result back to bytes:

import fyi.oxide.pdf.DocumentEditor

DocumentEditor.open(pdfBytes).use { editor ->
    editor.scrubMetadata()
    val cleaned: ByteArray = editor.save()
    println("cleaned: ${cleaned.size} bytes")
}

Next Steps