Skip to content

Java API Reference

PDF Oxide provides native Java bindings via a JNI layer over the Rust core. The bundled native library is loaded automatically at class-load time; pre-built natives ship for Linux, macOS, and Windows (x86_64 and ARM64).

<dependency>
  <groupId>fyi.oxide</groupId>
  <artifactId>pdf-oxide</artifactId>
  <version>0.3.69</version>
</dependency>

All classes live in the fyi.oxide.pdf package and its sub-packages (fyi.oxide.pdf.geometry, fyi.oxide.pdf.text, fyi.oxide.pdf.form, etc.).

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;

Lifecycle. PdfDocument, DocumentEditor, and Pdf own native memory and implement AutoCloseable. Always use try-with-resources. close() is idempotent; a Cleaner backstop frees leaked handles but must not be relied on for timely cleanup.

Thread safety. Document instances are not thread-safe — open one per worker. Stateless static helpers (MarkdownConverter, PdfValidator, PdfPolicy) are thread-safe.

For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference.


PdfDocument

The primary read-only entry point to a PDF — open, extract, render, and convert. Implements AutoCloseable.

import fyi.oxide.pdf.PdfDocument;
import java.nio.file.Paths;

try (PdfDocument doc = PdfDocument.open(Paths.get("invoice.pdf"))) {
    System.out.println(doc.extractText(0));
}

Opening (static factory methods)

static PdfDocument open(Path path)

Open a PDF from a filesystem path.

static PdfDocument open(String path)

Open a PDF from a path string.

static PdfDocument open(byte[] bytes)

Open a PDF from in-memory bytes (e.g. downloaded from S3 or HTTP).

static PdfDocument open(Path path, String password)

Open an encrypted PDF from a path with a user or owner password.

static PdfDocument open(String path, String password)

Open an encrypted PDF from a path string with a password.

static PdfDocument open(byte[] bytes, String password)

Open an encrypted PDF from bytes with a password.

static PdfDocument open(InputStream stream)

Open a PDF by reading all bytes from an InputStream.

One-shot static helpers

static String extractText(String path)

Open, extract all text, and close in a single call (path string).

static String extractText(Path path)

Open, extract all text, and close in a single call (Path).

Authentication

boolean authenticate(String password)

Authenticate an encrypted PDF after opening; returns true on success.

boolean authenticate(byte[] password)

Authenticate with a raw byte password.

Document info

int pageCount()

Return the number of pages in the document.

boolean isOpen()

Return true if the document handle is still open.

Text extraction

String extractText(int pageIndex)

Extract plain text from a single zero-indexed page.

String extractTextAuto(int pageIndex)

Extract text from a page, automatically falling back to OCR for scanned pages.

String extractStructured(int page)

Extract structured page content (spans, lines, layout) as a JSON string.

Conversion

String toMarkdown()

Convert the entire document to Markdown.

String toMarkdown(int pageIndex)

Convert a single page to Markdown.

String toHtml()

Convert the entire document to HTML.

String toHtml(int pageIndex)

Convert a single page to HTML.

DOM access

PdfPage page(int index)

Return a lazy PdfPage handle for the given zero-based index.

Rendering

byte[] render(int pageIndex)

Render a page to PNG bytes at the default DPI.

byte[] render(int pageIndex, int dpi)

Render a page to PNG bytes at the given DPI.

Lifecycle

void close()

Free the underlying native handle. Idempotent.


DocumentEditor

Mutable editing session over a PDF: form filling, redaction, metadata scrubbing, and saving. Returns this from mutators for fluent chaining. Implements AutoCloseable.

import fyi.oxide.pdf.DocumentEditor;

try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
    editor.setFormField("name", "Jane Doe")
          .setFormField("subscribe", true)
          .saveTo(Paths.get("filled.pdf"));
}

Opening (static factory methods)

static DocumentEditor open(Path path)

Open a PDF for editing from a Path.

static DocumentEditor open(String path)

Open a PDF for editing from a path string.

static DocumentEditor open(byte[] bytes)

Open a PDF for editing from in-memory bytes.

Form fields

DocumentEditor setFormField(String name, String value)

Set a text or choice form field value by name; returns this.

DocumentEditor setFormField(String name, boolean checked)

Set a checkbox / radio form field by name; returns this.

Redaction

DocumentEditor addRedaction(int pageIndex, BBox region)

Queue a redaction over a rectangular region on a page; returns this.

int redactionCount(int pageIndex)

Return the number of pending redactions on a page.

int redactionCount()

Return the total number of pending redactions across the document.

RedactResult applyRedactionsDestructive()

Apply all queued redactions, permanently removing covered content; returns a RedactResult.

Metadata

DocumentEditor scrubMetadata()

Remove document information and XMP metadata; returns this.

Saving

byte[] save()

Serialize the edited document to a new byte array (full rewrite).

void saveTo(Path out)

Write the edited document to a file (full rewrite).

byte[] saveIncremental()

Serialize using an incremental update, preserving the original bytes.

void saveIncrementalTo(Path out)

Write an incremental update to a file.

Lifecycle

boolean isOpen()
void close()

Check whether the editor is open, and free its native handle.


Pdf

Create new PDFs from Markdown, HTML, or images, and split existing PDFs. Implements AutoCloseable.

import fyi.oxide.pdf.Pdf;

try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
    pdf.saveTo(Paths.get("report.pdf"));
}

Creation (static factory methods)

static Pdf fromMarkdown(String markdown)

Create a PDF from Markdown content.

static Pdf fromHtml(String html)

Create a PDF from HTML content.

static Pdf fromImages(List<byte[]> images)

Create a multi-page PDF, one page per image (JPEG/PNG bytes).

Splitting

List<BookmarkSegment> planSplitByBookmarks(SplitByBookmarksOptions opts)

Compute the BookmarkSegment plan for splitting at a bookmark level without writing output.

List<byte[]> splitByBookmarks(SplitByBookmarksOptions opts)

Split the PDF at the configured bookmark level, returning one byte array per segment.

static int planSplitByBookmarksCount(byte[] sourcePdf, int level)

Return how many segments a bookmark-level split would produce, without opening a Pdf.

static byte[][] splitByBookmarksFromBytes(byte[] sourcePdf, int level)

Split source PDF bytes at the given bookmark level in one static call.

Saving and lifecycle

byte[] save()

Serialize the PDF to a byte array.

void saveTo(Path out)

Write the PDF to a file.

boolean isOpen()
void close()

Check whether the handle is open, and free native resources.


AutoExtractor

Adaptive extraction that classifies each page (text layer vs. scanned) and applies OCR where needed. Built from an open PdfDocument.

import fyi.oxide.pdf.AutoExtractor;

try (PdfDocument doc = PdfDocument.open("scan.pdf")) {
    AutoExtractor extractor = AutoExtractor.balanced(doc);
    AutoResult result = extractor.extractDocument();
    System.out.println(result.text());
}

Construction (static factory methods)

static AutoExtractor of(PdfDocument doc)

Create an extractor with default configuration.

static AutoExtractor of(PdfDocument doc, AutoExtractConfig config)

Create an extractor with an explicit AutoExtractConfig.

static AutoExtractor fast(PdfDocument doc)

Create an extractor tuned for speed (text-layer first).

static AutoExtractor balanced(PdfDocument doc)

Create an extractor with a balanced speed/fidelity preset.

static AutoExtractor highFidelity(PdfDocument doc)

Create an extractor tuned for maximum fidelity (aggressive OCR).

Extraction

String extractText()

Extract plain text across the whole document.

String extractTextForPage(int pageIndex)

Extract plain text for a single page.

AutoResult extractDocument()

Run full adaptive extraction over the document; returns an AutoResult.

AutoResult extractAutoDocument()

Alias of extractDocument() returning the complete document result.

AutoResult extractPage(int pageIndex)

Run adaptive extraction for a single page.

AutoResult extractAutoPage(int pageIndex)

Alias of extractPage() for a single page.

Classification

ClassifyResult classifyDocument()

Classify every page without extracting; returns a ClassifyResult.

ClassifyResult classifyPage(int pageIndex)

Classify a single page.

JSON output

String extractDocumentJson()

Extract the whole document and serialize the result to JSON.

String extractPageJson(int pageIndex)

Extract a page and serialize the result to JSON.

Accessors

PdfDocument document()

Return the underlying PdfDocument.

AutoExtractConfig config()

Return the active configuration.


MarkdownConverter

Stateless, thread-safe static helpers for Markdown and HTML conversion.

static String toMarkdown(PdfDocument doc, int pageIndex)

Convert a single page to Markdown.

static String toMarkdown(PdfDocument doc)

Convert the whole document to Markdown.

static String toHtml(PdfDocument doc, int pageIndex)

Convert a single page to HTML.

static String toHtml(PdfDocument doc)

Convert the whole document to HTML.


PdfSigner

Digital signing and verification using a PKCS#12 keystore.

import fyi.oxide.pdf.PdfSigner;
import fyi.oxide.pdf.signature.SignOptions;

PdfSigner signer = PdfSigner.fromPkcs12(Paths.get("cert.p12"), "keystore-pw");
byte[] signed = signer.sign(pdfBytes, SignOptions.builder().withReason("Approved").build());
static PdfSigner fromPkcs12(Path keystore, String password)

Load a signer from a PKCS#12 keystore file.

static PdfSigner fromPkcs12(byte[] keystoreBytes, String password)

Load a signer from in-memory PKCS#12 keystore bytes.

byte[] sign(byte[] pdf, SignOptions opts)

Sign PDF bytes with the configured certificate and SignOptions; returns the signed PDF.

boolean verify(byte[] pdf)

Verify the signatures embedded in a PDF; returns true if valid.

static SignatureLevel classifyLevel(byte[] pdf)

Classify the PAdES signature level of a signed PDF; returns a SignatureLevel.


PdfValidator

Stateless, thread-safe PDF/A, PDF/X, and PDF/UA compliance validation.

static boolean isPdfA(PdfDocument doc, PdfALevel level)

Quick boolean check for PDF/A conformance at a given level.

static boolean isPdfUa(PdfDocument doc, PdfUaLevel level)

Quick boolean check for PDF/UA conformance at a given level.

static ValidationResult validatePdfA(PdfDocument doc, PdfALevel level)

Validate against a PDF/A level; returns a ValidationResult with violations.

static ValidationResult validatePdfX(PdfDocument doc, PdfXLevel level)

Validate against a PDF/X level.

static ValidationResult validatePdfUa(PdfDocument doc, PdfUaLevel level)

Validate against a PDF/UA level.


PdfPolicy

Process-wide security policy controlling which cryptographic algorithms are permitted. Thread-safe static accessors.

static PolicyMode current()

Return the currently active policy mode.

static void set(PolicyMode mode)

Set the process-wide policy mode.

static PolicyMode compat()

Return the permissive compatibility mode constant.

static PolicyMode strict()

Return the strict mode constant.

static PolicyMode fipsStrict()

Return the FIPS-strict mode constant.


PdfPage

A lazy page handle returned by PdfDocument.page(int). Properties dispatch to the parent document on access.

PdfDocument parent()

Return the owning document.

int index()

Return the zero-based page index.

BBox mediaBox()

Return the page MediaBox as a BBox.

BBox cropBox()

Return the page CropBox.

double width()
double height()

Return page width and height in PDF points.

int rotation()

Return the page rotation in degrees (0, 90, 180, 270).

String text()

Extract all plain text on the page.

String text(BBox region)

Extract text within a rectangular region.

List<TextWord> words()

Return per-word text with bounding boxes (TextWord).

List<TextLine> lines()

Return per-line text (TextLine).

List<TextChar> chars()

Return per-character data (TextChar).


Geometry types

BBox

Immutable axis-aligned bounding box in PDF coordinates.

BBox(double x0, double y0, double x1, double y1)
double x0()
double y0()
double x1()
double y1()
double width()
double height()

Rect

Position-and-size rectangle (origin + width/height).

Rect(double x, double y, double width, double height)
double x()
double y()
double width()
double height()
BBox toBBox()

Point

A 2-D point.

Point(double x, double y)
double x()
double y()

Color

8-bit RGBA color.

Color(int r, int g, int b, int a)
Color(int r, int g, int b)
int r()
int g()
int b()
int a()

Text types

TextChar

A single decoded character with position and OCR confidence.

TextChar(int codepoint, BBox bbox, float confidence)
int codepoint()
BBox bbox()
float confidence()
String asString()

TextWord

A whitespace-delimited word with bounds and confidence.

TextWord(String text, BBox bbox, float confidence)
String text()
BBox bbox()
float confidence()

TextLine

A line of text composed of words.

TextLine(String text, BBox bbox, List<TextWord> words)
String text()
BBox bbox()
List<TextWord> words()

TextSpan

A run of identically-styled text.

TextSpan(String text, BBox bbox, TextStyle style)
String text()
BBox bbox()
TextStyle style()

TextStyle

Font and style metadata for a span.

TextStyle(String font, double size, Color color, boolean bold, boolean italic)
double size()
Color color()
boolean bold()
boolean italic()

Table types

Table

A detected table with cell grid.

Table(BBox bbox, int rows, int cols, List<TableCell> cells)
BBox bbox()
int rows()
int cols()
List<TableCell> cells()

TableCell

A single cell, including span information.

TableCell(String text, BBox bbox, int row, int col, int rowSpan, int colSpan)
String text()
BBox bbox()
int row()
int col()
int rowSpan()
int colSpan()

Image types

ExtractedImage

A raster image extracted from a page.

ExtractedImage(byte[] bytes, ImageFormat format, BBox bbox, int width, int height)
byte[] bytes()
ImageFormat format()
BBox bbox()
int width()
int height()

ImageFormat (enum)

JPEG, PNG, CCITT, RAW.


Search types

SearchOptions

Builder-configured search parameters.

boolean caseSensitive()
boolean wholeWord()
boolean regex()
Optional<Integer> maxResults()
static SearchOptions.Builder builder()

Builder: withCaseSensitive(boolean), withWholeWord(boolean), withRegex(boolean), withMaxResults(Integer), withMaxResults(int), build().

SearchResult

The full result of a query.

SearchResult(String query, List<SearchMatch> matches)
String query()
List<SearchMatch> matches()
int count()
boolean isEmpty()

SearchMatch

A single hit with page and location.

SearchMatch(int pageIndex, BBox bbox, String text)
int pageIndex()
BBox bbox()
String text()

Form types

FormField

An AcroForm field.

FormField(String name, FormFieldType type, String value, BBox bbox, int pageIndex)
String name()
FormFieldType type()
Optional<String> value()
Optional<BBox> bbox()
int pageIndex()

FormFieldType (enum)

TEXT, CHECKBOX, RADIO, CHOICE.


Annotation types

Annotation

A page annotation.

Annotation(AnnotationType type, int pageIndex, BBox bbox, String contents, String uri)
AnnotationType type()
int pageIndex()
BBox bbox()
Optional<String> contents()
Optional<String> uri()

AnnotationType (enum)

HIGHLIGHT, TEXT, LINK, STAMP, UNDERLINE, STRIKEOUT, SQUIGGLY, FREE_TEXT, LINE, SQUARE, CIRCLE, FILE_ATTACHMENT.


Metadata types

DocumentInfo

Standard document information dictionary fields.

Optional<String> title()
Optional<String> author()
Optional<String> subject()
Optional<String> keywords()
Optional<String> creator()
Optional<String> producer()
Optional<String> creationDate()
Optional<String> modificationDate()

XmpMetadata

Raw XMP metadata packet.

XmpMetadata(String xml)
String xml()
boolean isEmpty()

Auto-extraction types

AutoExtractConfig

Immutable, builder-constructed configuration for AutoExtractor.

Optional<ExtractMode> mode()
Optional<List<Integer>> forceOcrPages()
Optional<Double> minOcrConfidence()
Optional<List<String>> ocrLanguages()
Optional<List<String>> passwords()
Optional<Double> topMarginFraction()
Optional<Double> bottomMarginFraction()
Optional<Boolean> allowSingleColumnTables()
Optional<Boolean> ocrInlineImages()
Optional<String> cancelToken()
static AutoExtractConfig.Builder builder()
AutoExtractConfig.Builder toBuilder()

Builder methods: withMode(ExtractMode), withForceOcrPages(List<Integer>), withMinOcrConfidence(Double), withOcrLanguages(List<String>), withOcrLanguages(String...), withPasswords(List<String>), withPasswords(String...), withTopMarginFraction(Double), withTopMarginFraction(double), withBottomMarginFraction(Double), withBottomMarginFraction(double), withAllowSingleColumnTables(Boolean), withAllowSingleColumnTables(boolean), withOcrInlineImages(Boolean), withOcrInlineImages(boolean), withCancelToken(String), build().

AutoResult

Result of an adaptive extraction.

String text()
Optional<String> markdown()
Optional<String> html()
ExtractReason reason()
double confidence()
boolean ocrUsed()
List<RegionResult> regions()
List<Integer> pagesNeedingOcr()

RegionResult

Per-region extraction outcome.

int pageIndex()
BBox bbox()
String text()
ExtractReason reason()
double confidence()
boolean ocrUsed()
Optional<Table> table()

ClassifyResult

Result of page classification.

List<PageClass> pages()
List<Integer> pagesNeedingOcr()
List<Integer> pagesWithChart()
List<Integer> pagesEncrypted()

ExtractMode (enum)

TEXT_ONLY, AUTO.

PageClass (enum)

TEXT_LAYER, SCANNED, MIXED.

ExtractReason (enum)

OK, SCANNED_NO_TEXT_LAYER, GLYPH_MAPPING_MISSING, ENCRYPTED_NO_EXTRACT_PERMISSION, IMAGE_TABLE_NO_STRUCTURE, CHART_NOT_TRANSCRIBED, OCR_REQUESTED_BUT_UNAVAILABLE, OCR_LOW_CONFIDENCE, EMPTY.


Signature types

SignOptions

Builder-constructed signing parameters for PdfSigner.

SignatureLevel level()
Optional<String> reason()
Optional<String> location()
Optional<String> contactInfo()
Optional<String> tsaUrl()
static SignOptions.Builder builder()

Builder: withLevel(SignatureLevel), withReason(String), withLocation(String), withContactInfo(String), withTsaUrl(String), build().

SignatureLevel (enum)

PAdES baseline levels: B_B (basic), B_T (with trusted timestamp).


Split types

SplitByBookmarksOptions

Builder-constructed options for bookmark-based splitting.

int level()
Optional<String> filenamePrefix()
static SplitByBookmarksOptions.Builder builder()

Builder: withLevel(int), withFilenamePrefix(String), build().

BookmarkSegment

A planned output segment from a bookmark split.

BookmarkSegment(String title, int firstPage, int lastPage, String filename)
String title()
int firstPage()
int lastPage()
String filename()

Redaction types

RedactResult

Outcome of DocumentEditor.applyRedactionsDestructive().

RedactResult(int regionsApplied, boolean oracleVerified)
int regionsApplied()
boolean oracleVerified()

Compliance types

ValidationResult

Result of a PdfValidator check.

ValidationResult(boolean valid, List<ValidationViolation> violations)
boolean valid()
List<ValidationViolation> violations()

ValidationViolation

A single conformance violation.

ValidationViolation(String ruleId, String description, Integer pageIndex)
String ruleId()
String description()
Optional<Integer> pageIndex()

PdfALevel (enum)

A_1B, A_1A, A_2B, A_2A, A_2U, A_3B, A_3A, A_3U, A_4, A_4E.

PdfXLevel (enum)

X_1A_2001, X_1A_2003, X_3_2002, X_3_2003, X_4, X_4P, X_5G, X_5N, X_5PG, X_6, X_6P.

PdfUaLevel (enum)

UA_1, UA_2 — each exposes int code().


Policy types

PolicyMode (enum)

COMPAT, STRICT.

SecurityPolicy

Builder-constructed per-operation security policy.

PolicyMode mode()
List<String> additionalAllow()
List<String> additionalDeny()
static SecurityPolicy.Builder builder()

Builder: withMode(PolicyMode), allow(String algId), deny(String algId), build().


Render types

PixelFormat (enum)

RGBA_8888, RGB_888, GRAY_8.


Error handling

All PDF-specific failures throw PdfException (an unchecked RuntimeException) or one of its subclasses. Every exception carries a PdfErrorKind kind().

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.exception.PdfException;

try (PdfDocument doc = PdfDocument.open("file.pdf")) {
    String text = doc.extractText(0);
} catch (PdfException e) {
    System.err.println(e.kind() + ": " + e.getMessage());
}

PdfException

PdfException(String message)
PdfException(PdfErrorKind kind, String message)
PdfException(PdfErrorKind kind, String message, Throwable cause)
PdfErrorKind kind()

Subclasses

Exception Thrown when
PdfParseException The file is malformed or not a valid PDF
PdfEncryptedException The PDF is encrypted and no/invalid password was supplied
PdfPermissionException The requested operation is denied by document permissions
PdfIoException An underlying I/O error occurred
PdfOcrUnavailableException OCR was requested but the OCR backend is unavailable
PdfSignatureException A signing or verification operation failed
PdfInvalidStateException An operation was called on a closed or invalid handle
PdfUnsupportedException A requested feature is not supported

PdfErrorKind (enum)

PARSE, ENCRYPTED, PERMISSION, IO, OCR_UNAVAILABLE, SIGNATURE, INVALID_STATE, UNSUPPORTED.


Complete example

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;
import fyi.oxide.pdf.AutoExtractor;
import fyi.oxide.pdf.auto.AutoResult;
import java.nio.file.Paths;

public class Example {
    public static void main(String[] args) throws Exception {
        // --- Extraction ---
        try (PdfDocument doc = PdfDocument.open(Paths.get("input.pdf"))) {
            System.out.println("Pages: " + doc.pageCount());
            for (int i = 0; i < doc.pageCount(); i++) {
                System.out.println(doc.extractText(i));
            }
            String markdown = doc.toMarkdown();

            // Adaptive extraction with OCR fallback
            AutoResult auto = AutoExtractor.balanced(doc).extractDocument();
            System.out.println("OCR used: " + auto.ocrUsed());
        }

        // --- Creation ---
        try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
            pdf.saveTo(Paths.get("report.pdf"));
        }

        // --- Editing ---
        try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
            editor.setFormField("name", "Jane Doe")
                  .setFormField("subscribe", true)
                  .scrubMetadata()
                  .saveTo(Paths.get("filled.pdf"));
        }
    }
}

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.

Next Steps