Java API Reference
PDF Oxide provides native Java bindings via a JNI layer over the Rust core. The bundled native library is loaded automatically at class-load time; pre-built natives ship for Linux, macOS, and Windows (x86_64 and ARM64).
<dependency>
<groupId>fyi.oxide</groupId>
<artifactId>pdf-oxide</artifactId>
<version>0.3.69</version>
</dependency>
All classes live in the fyi.oxide.pdf package and its sub-packages (fyi.oxide.pdf.geometry, fyi.oxide.pdf.text, fyi.oxide.pdf.form, etc.).
import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;
Lifecycle.
PdfDocument,DocumentEditor, andAutoCloseable. Always use try-with-resources.close()is idempotent; aCleanerbackstop frees leaked handles but must not be relied on for timely cleanup.Thread safety. Document instances are not thread-safe — open one per worker. Stateless static helpers (
MarkdownConverter,PdfValidator,PdfPolicy) are thread-safe.
For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference.
PdfDocument
The primary read-only entry point to a PDF — open, extract, render, and convert. Implements AutoCloseable.
import fyi.oxide.pdf.PdfDocument;
import java.nio.file.Paths;
try (PdfDocument doc = PdfDocument.open(Paths.get("invoice.pdf"))) {
System.out.println(doc.extractText(0));
}
Opening (static factory methods)
static PdfDocument open(Path path)
Open a PDF from a filesystem path.
static PdfDocument open(String path)
Open a PDF from a path string.
static PdfDocument open(byte[] bytes)
Open a PDF from in-memory bytes (e.g. downloaded from S3 or HTTP).
static PdfDocument open(Path path, String password)
Open an encrypted PDF from a path with a user or owner password.
static PdfDocument open(String path, String password)
Open an encrypted PDF from a path string with a password.
static PdfDocument open(byte[] bytes, String password)
Open an encrypted PDF from bytes with a password.
static PdfDocument open(InputStream stream)
Open a PDF by reading all bytes from an InputStream.
One-shot static helpers
static String extractText(String path)
Open, extract all text, and close in a single call (path string).
static String extractText(Path path)
Open, extract all text, and close in a single call (Path).
Authentication
boolean authenticate(String password)
Authenticate an encrypted PDF after opening; returns true on success.
boolean authenticate(byte[] password)
Authenticate with a raw byte password.
Document info
int pageCount()
Return the number of pages in the document.
boolean isOpen()
Return true if the document handle is still open.
Text extraction
String extractText(int pageIndex)
Extract plain text from a single zero-indexed page.
String extractTextAuto(int pageIndex)
Extract text from a page, automatically falling back to OCR for scanned pages.
String extractStructured(int page)
Extract structured page content (spans, lines, layout) as a JSON string.
Conversion
String toMarkdown()
Convert the entire document to Markdown.
String toMarkdown(int pageIndex)
Convert a single page to Markdown.
String toHtml()
Convert the entire document to HTML.
String toHtml(int pageIndex)
Convert a single page to HTML.
DOM access
PdfPage page(int index)
Return a lazy PdfPage handle for the given zero-based index.
Rendering
byte[] render(int pageIndex)
Render a page to PNG bytes at the default DPI.
byte[] render(int pageIndex, int dpi)
Render a page to PNG bytes at the given DPI.
Lifecycle
void close()
Free the underlying native handle. Idempotent.
DocumentEditor
Mutable editing session over a PDF: form filling, redaction, metadata scrubbing, and saving. Returns this from mutators for fluent chaining. Implements AutoCloseable.
import fyi.oxide.pdf.DocumentEditor;
try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
editor.setFormField("name", "Jane Doe")
.setFormField("subscribe", true)
.saveTo(Paths.get("filled.pdf"));
}
Opening (static factory methods)
static DocumentEditor open(Path path)
Open a PDF for editing from a Path.
static DocumentEditor open(String path)
Open a PDF for editing from a path string.
static DocumentEditor open(byte[] bytes)
Open a PDF for editing from in-memory bytes.
Form fields
DocumentEditor setFormField(String name, String value)
Set a text or choice form field value by name; returns this.
DocumentEditor setFormField(String name, boolean checked)
Set a checkbox / radio form field by name; returns this.
Redaction
DocumentEditor addRedaction(int pageIndex, BBox region)
Queue a redaction over a rectangular region on a page; returns this.
int redactionCount(int pageIndex)
Return the number of pending redactions on a page.
int redactionCount()
Return the total number of pending redactions across the document.
RedactResult applyRedactionsDestructive()
Apply all queued redactions, permanently removing covered content; returns a RedactResult.
Metadata
DocumentEditor scrubMetadata()
Remove document information and XMP metadata; returns this.
Saving
byte[] save()
Serialize the edited document to a new byte array (full rewrite).
void saveTo(Path out)
Write the edited document to a file (full rewrite).
byte[] saveIncremental()
Serialize using an incremental update, preserving the original bytes.
void saveIncrementalTo(Path out)
Write an incremental update to a file.
Lifecycle
boolean isOpen()
void close()
Check whether the editor is open, and free its native handle.
Create new PDFs from Markdown, HTML, or images, and split existing PDFs. Implements AutoCloseable.
import fyi.oxide.pdf.Pdf;
try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
pdf.saveTo(Paths.get("report.pdf"));
}
Creation (static factory methods)
static Pdf fromMarkdown(String markdown)
Create a PDF from Markdown content.
static Pdf fromHtml(String html)
Create a PDF from HTML content.
static Pdf fromImages(List<byte[]> images)
Create a multi-page PDF, one page per image (JPEG/PNG bytes).
Splitting
List<BookmarkSegment> planSplitByBookmarks(SplitByBookmarksOptions opts)
Compute the BookmarkSegment plan for splitting at a bookmark level without writing output.
List<byte[]> splitByBookmarks(SplitByBookmarksOptions opts)
Split the PDF at the configured bookmark level, returning one byte array per segment.
static int planSplitByBookmarksCount(byte[] sourcePdf, int level)
Return how many segments a bookmark-level split would produce, without opening a Pdf.
static byte[][] splitByBookmarksFromBytes(byte[] sourcePdf, int level)
Split source PDF bytes at the given bookmark level in one static call.
Saving and lifecycle
byte[] save()
Serialize the PDF to a byte array.
void saveTo(Path out)
Write the PDF to a file.
boolean isOpen()
void close()
Check whether the handle is open, and free native resources.
AutoExtractor
Adaptive extraction that classifies each page (text layer vs. scanned) and applies OCR where needed. Built from an open PdfDocument.
import fyi.oxide.pdf.AutoExtractor;
try (PdfDocument doc = PdfDocument.open("scan.pdf")) {
AutoExtractor extractor = AutoExtractor.balanced(doc);
AutoResult result = extractor.extractDocument();
System.out.println(result.text());
}
Construction (static factory methods)
static AutoExtractor of(PdfDocument doc)
Create an extractor with default configuration.
static AutoExtractor of(PdfDocument doc, AutoExtractConfig config)
Create an extractor with an explicit AutoExtractConfig.
static AutoExtractor fast(PdfDocument doc)
Create an extractor tuned for speed (text-layer first).
static AutoExtractor balanced(PdfDocument doc)
Create an extractor with a balanced speed/fidelity preset.
static AutoExtractor highFidelity(PdfDocument doc)
Create an extractor tuned for maximum fidelity (aggressive OCR).
Extraction
String extractText()
Extract plain text across the whole document.
String extractTextForPage(int pageIndex)
Extract plain text for a single page.
AutoResult extractDocument()
Run full adaptive extraction over the document; returns an AutoResult.
AutoResult extractAutoDocument()
Alias of extractDocument() returning the complete document result.
AutoResult extractPage(int pageIndex)
Run adaptive extraction for a single page.
AutoResult extractAutoPage(int pageIndex)
Alias of extractPage() for a single page.
Classification
ClassifyResult classifyDocument()
Classify every page without extracting; returns a ClassifyResult.
ClassifyResult classifyPage(int pageIndex)
Classify a single page.
JSON output
String extractDocumentJson()
Extract the whole document and serialize the result to JSON.
String extractPageJson(int pageIndex)
Extract a page and serialize the result to JSON.
Accessors
PdfDocument document()
Return the underlying PdfDocument.
AutoExtractConfig config()
Return the active configuration.
MarkdownConverter
Stateless, thread-safe static helpers for Markdown and HTML conversion.
static String toMarkdown(PdfDocument doc, int pageIndex)
Convert a single page to Markdown.
static String toMarkdown(PdfDocument doc)
Convert the whole document to Markdown.
static String toHtml(PdfDocument doc, int pageIndex)
Convert a single page to HTML.
static String toHtml(PdfDocument doc)
Convert the whole document to HTML.
PdfSigner
Digital signing and verification using a PKCS#12 keystore.
import fyi.oxide.pdf.PdfSigner;
import fyi.oxide.pdf.signature.SignOptions;
PdfSigner signer = PdfSigner.fromPkcs12(Paths.get("cert.p12"), "keystore-pw");
byte[] signed = signer.sign(pdfBytes, SignOptions.builder().withReason("Approved").build());
static PdfSigner fromPkcs12(Path keystore, String password)
Load a signer from a PKCS#12 keystore file.
static PdfSigner fromPkcs12(byte[] keystoreBytes, String password)
Load a signer from in-memory PKCS#12 keystore bytes.
byte[] sign(byte[] pdf, SignOptions opts)
Sign PDF bytes with the configured certificate and SignOptions; returns the signed PDF.
boolean verify(byte[] pdf)
Verify the signatures embedded in a PDF; returns true if valid.
static SignatureLevel classifyLevel(byte[] pdf)
Classify the PAdES signature level of a signed PDF; returns a SignatureLevel.
PdfValidator
Stateless, thread-safe PDF/A, PDF/X, and PDF/UA compliance validation.
static boolean isPdfA(PdfDocument doc, PdfALevel level)
Quick boolean check for PDF/A conformance at a given level.
static boolean isPdfUa(PdfDocument doc, PdfUaLevel level)
Quick boolean check for PDF/UA conformance at a given level.
static ValidationResult validatePdfA(PdfDocument doc, PdfALevel level)
Validate against a PDF/A level; returns a ValidationResult with violations.
static ValidationResult validatePdfX(PdfDocument doc, PdfXLevel level)
Validate against a PDF/X level.
static ValidationResult validatePdfUa(PdfDocument doc, PdfUaLevel level)
Validate against a PDF/UA level.
PdfPolicy
Process-wide security policy controlling which cryptographic algorithms are permitted. Thread-safe static accessors.
static PolicyMode current()
Return the currently active policy mode.
static void set(PolicyMode mode)
Set the process-wide policy mode.
static PolicyMode compat()
Return the permissive compatibility mode constant.
static PolicyMode strict()
Return the strict mode constant.
static PolicyMode fipsStrict()
Return the FIPS-strict mode constant.
PdfPage
A lazy page handle returned by PdfDocument.page(int). Properties dispatch to the parent document on access.
PdfDocument parent()
Return the owning document.
int index()
Return the zero-based page index.
BBox mediaBox()
Return the page MediaBox as a BBox.
BBox cropBox()
Return the page CropBox.
double width()
double height()
Return page width and height in PDF points.
int rotation()
Return the page rotation in degrees (0, 90, 180, 270).
String text()
Extract all plain text on the page.
String text(BBox region)
Extract text within a rectangular region.
List<TextWord> words()
Return per-word text with bounding boxes (TextWord).
List<TextLine> lines()
Return per-line text (TextLine).
List<TextChar> chars()
Return per-character data (TextChar).
Geometry types
BBox
Immutable axis-aligned bounding box in PDF coordinates.
BBox(double x0, double y0, double x1, double y1)
double x0()
double y0()
double x1()
double y1()
double width()
double height()
Rect
Position-and-size rectangle (origin + width/height).
Rect(double x, double y, double width, double height)
double x()
double y()
double width()
double height()
BBox toBBox()
Point
A 2-D point.
Point(double x, double y)
double x()
double y()
Color
8-bit RGBA color.
Color(int r, int g, int b, int a)
Color(int r, int g, int b)
int r()
int g()
int b()
int a()
Text types
TextChar
A single decoded character with position and OCR confidence.
TextChar(int codepoint, BBox bbox, float confidence)
int codepoint()
BBox bbox()
float confidence()
String asString()
TextWord
A whitespace-delimited word with bounds and confidence.
TextWord(String text, BBox bbox, float confidence)
String text()
BBox bbox()
float confidence()
TextLine
A line of text composed of words.
TextLine(String text, BBox bbox, List<TextWord> words)
String text()
BBox bbox()
List<TextWord> words()
TextSpan
A run of identically-styled text.
TextSpan(String text, BBox bbox, TextStyle style)
String text()
BBox bbox()
TextStyle style()
TextStyle
Font and style metadata for a span.
TextStyle(String font, double size, Color color, boolean bold, boolean italic)
double size()
Color color()
boolean bold()
boolean italic()
Table types
Table
A detected table with cell grid.
Table(BBox bbox, int rows, int cols, List<TableCell> cells)
BBox bbox()
int rows()
int cols()
List<TableCell> cells()
TableCell
A single cell, including span information.
TableCell(String text, BBox bbox, int row, int col, int rowSpan, int colSpan)
String text()
BBox bbox()
int row()
int col()
int rowSpan()
int colSpan()
Image types
ExtractedImage
A raster image extracted from a page.
ExtractedImage(byte[] bytes, ImageFormat format, BBox bbox, int width, int height)
byte[] bytes()
ImageFormat format()
BBox bbox()
int width()
int height()
ImageFormat (enum)
JPEG, PNG, CCITT, RAW.
Search types
SearchOptions
Builder-configured search parameters.
boolean caseSensitive()
boolean wholeWord()
boolean regex()
Optional<Integer> maxResults()
static SearchOptions.Builder builder()
Builder: withCaseSensitive(boolean), withWholeWord(boolean), withRegex(boolean), withMaxResults(Integer), withMaxResults(int), build().
SearchResult
The full result of a query.
SearchResult(String query, List<SearchMatch> matches)
String query()
List<SearchMatch> matches()
int count()
boolean isEmpty()
SearchMatch
A single hit with page and location.
SearchMatch(int pageIndex, BBox bbox, String text)
int pageIndex()
BBox bbox()
String text()
Form types
FormField
An AcroForm field.
FormField(String name, FormFieldType type, String value, BBox bbox, int pageIndex)
String name()
FormFieldType type()
Optional<String> value()
Optional<BBox> bbox()
int pageIndex()
FormFieldType (enum)
TEXT, CHECKBOX, RADIO, CHOICE.
Annotation types
Annotation
A page annotation.
Annotation(AnnotationType type, int pageIndex, BBox bbox, String contents, String uri)
AnnotationType type()
int pageIndex()
BBox bbox()
Optional<String> contents()
Optional<String> uri()
AnnotationType (enum)
HIGHLIGHT, TEXT, LINK, STAMP, UNDERLINE, STRIKEOUT, SQUIGGLY, FREE_TEXT, LINE, SQUARE, CIRCLE, FILE_ATTACHMENT.
Metadata types
DocumentInfo
Standard document information dictionary fields.
Optional<String> title()
Optional<String> author()
Optional<String> subject()
Optional<String> keywords()
Optional<String> creator()
Optional<String> producer()
Optional<String> creationDate()
Optional<String> modificationDate()
XmpMetadata
Raw XMP metadata packet.
XmpMetadata(String xml)
String xml()
boolean isEmpty()
Auto-extraction types
AutoExtractConfig
Immutable, builder-constructed configuration for AutoExtractor.
Optional<ExtractMode> mode()
Optional<List<Integer>> forceOcrPages()
Optional<Double> minOcrConfidence()
Optional<List<String>> ocrLanguages()
Optional<List<String>> passwords()
Optional<Double> topMarginFraction()
Optional<Double> bottomMarginFraction()
Optional<Boolean> allowSingleColumnTables()
Optional<Boolean> ocrInlineImages()
Optional<String> cancelToken()
static AutoExtractConfig.Builder builder()
AutoExtractConfig.Builder toBuilder()
Builder methods: withMode(ExtractMode), withForceOcrPages(List<Integer>), withMinOcrConfidence(Double), withOcrLanguages(List<String>), withOcrLanguages(String...), withPasswords(List<String>), withPasswords(String...), withTopMarginFraction(Double), withTopMarginFraction(double), withBottomMarginFraction(Double), withBottomMarginFraction(double), withAllowSingleColumnTables(Boolean), withAllowSingleColumnTables(boolean), withOcrInlineImages(Boolean), withOcrInlineImages(boolean), withCancelToken(String), build().
AutoResult
Result of an adaptive extraction.
String text()
Optional<String> markdown()
Optional<String> html()
ExtractReason reason()
double confidence()
boolean ocrUsed()
List<RegionResult> regions()
List<Integer> pagesNeedingOcr()
RegionResult
Per-region extraction outcome.
int pageIndex()
BBox bbox()
String text()
ExtractReason reason()
double confidence()
boolean ocrUsed()
Optional<Table> table()
ClassifyResult
Result of page classification.
List<PageClass> pages()
List<Integer> pagesNeedingOcr()
List<Integer> pagesWithChart()
List<Integer> pagesEncrypted()
ExtractMode (enum)
TEXT_ONLY, AUTO.
PageClass (enum)
TEXT_LAYER, SCANNED, MIXED.
ExtractReason (enum)
OK, SCANNED_NO_TEXT_LAYER, GLYPH_MAPPING_MISSING, ENCRYPTED_NO_EXTRACT_PERMISSION, IMAGE_TABLE_NO_STRUCTURE, CHART_NOT_TRANSCRIBED, OCR_REQUESTED_BUT_UNAVAILABLE, OCR_LOW_CONFIDENCE, EMPTY.
Signature types
SignOptions
Builder-constructed signing parameters for PdfSigner.
SignatureLevel level()
Optional<String> reason()
Optional<String> location()
Optional<String> contactInfo()
Optional<String> tsaUrl()
static SignOptions.Builder builder()
Builder: withLevel(SignatureLevel), withReason(String), withLocation(String), withContactInfo(String), withTsaUrl(String), build().
SignatureLevel (enum)
PAdES baseline levels: B_B (basic), B_T (with trusted timestamp).
Split types
SplitByBookmarksOptions
Builder-constructed options for bookmark-based splitting.
int level()
Optional<String> filenamePrefix()
static SplitByBookmarksOptions.Builder builder()
Builder: withLevel(int), withFilenamePrefix(String), build().
BookmarkSegment
A planned output segment from a bookmark split.
BookmarkSegment(String title, int firstPage, int lastPage, String filename)
String title()
int firstPage()
int lastPage()
String filename()
Redaction types
RedactResult
Outcome of DocumentEditor.applyRedactionsDestructive().
RedactResult(int regionsApplied, boolean oracleVerified)
int regionsApplied()
boolean oracleVerified()
Compliance types
ValidationResult
Result of a PdfValidator check.
ValidationResult(boolean valid, List<ValidationViolation> violations)
boolean valid()
List<ValidationViolation> violations()
ValidationViolation
A single conformance violation.
ValidationViolation(String ruleId, String description, Integer pageIndex)
String ruleId()
String description()
Optional<Integer> pageIndex()
PdfALevel (enum)
A_1B, A_1A, A_2B, A_2A, A_2U, A_3B, A_3A, A_3U, A_4, A_4E.
PdfXLevel (enum)
X_1A_2001, X_1A_2003, X_3_2002, X_3_2003, X_4, X_4P, X_5G, X_5N, X_5PG, X_6, X_6P.
PdfUaLevel (enum)
UA_1, UA_2 — each exposes int code().
Policy types
PolicyMode (enum)
COMPAT, STRICT.
SecurityPolicy
Builder-constructed per-operation security policy.
PolicyMode mode()
List<String> additionalAllow()
List<String> additionalDeny()
static SecurityPolicy.Builder builder()
Builder: withMode(PolicyMode), allow(String algId), deny(String algId), build().
Render types
PixelFormat (enum)
RGBA_8888, RGB_888, GRAY_8.
Error handling
All PDF-specific failures throw PdfException (an unchecked RuntimeException) or one of its subclasses. Every exception carries a PdfErrorKind kind().
import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.exception.PdfException;
try (PdfDocument doc = PdfDocument.open("file.pdf")) {
String text = doc.extractText(0);
} catch (PdfException e) {
System.err.println(e.kind() + ": " + e.getMessage());
}
PdfException
PdfException(String message)
PdfException(PdfErrorKind kind, String message)
PdfException(PdfErrorKind kind, String message, Throwable cause)
PdfErrorKind kind()
Subclasses
| Exception | Thrown when |
|---|---|
PdfParseException |
The file is malformed or not a valid PDF |
PdfEncryptedException |
The PDF is encrypted and no/invalid password was supplied |
PdfPermissionException |
The requested operation is denied by document permissions |
PdfIoException |
An underlying I/O error occurred |
PdfOcrUnavailableException |
OCR was requested but the OCR backend is unavailable |
PdfSignatureException |
A signing or verification operation failed |
PdfInvalidStateException |
An operation was called on a closed or invalid handle |
PdfUnsupportedException |
A requested feature is not supported |
PdfErrorKind (enum)
PARSE, ENCRYPTED, PERMISSION, IO, OCR_UNAVAILABLE, SIGNATURE, INVALID_STATE, UNSUPPORTED.
Complete example
import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;
import fyi.oxide.pdf.AutoExtractor;
import fyi.oxide.pdf.auto.AutoResult;
import java.nio.file.Paths;
public class Example {
public static void main(String[] args) throws Exception {
// --- Extraction ---
try (PdfDocument doc = PdfDocument.open(Paths.get("input.pdf"))) {
System.out.println("Pages: " + doc.pageCount());
for (int i = 0; i < doc.pageCount(); i++) {
System.out.println(doc.extractText(i));
}
String markdown = doc.toMarkdown();
// Adaptive extraction with OCR fallback
AutoResult auto = AutoExtractor.balanced(doc).extractDocument();
System.out.println("OCR used: " + auto.ocrUsed());
}
// --- Creation ---
try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
pdf.saveTo(Paths.get("report.pdf"));
}
// --- Editing ---
try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
editor.setFormField("name", "Jane Doe")
.setFormField("subscribe", true)
.scrubMetadata()
.saveTo(Paths.get("filled.pdf"));
}
}
}
Other Language Bindings
PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.
Next Steps
- Types & Enums — all shared types and enums
- Page API Reference — consistent per-page iteration across bindings
- Getting Started with Java — tutorial