What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Kotlin API リファレンス

PDF Oxide は、慣用的な Kotlin/JVM バインディング（Android 対応）を、成熟した fyi.oxide:pdf-oxide Java バインディングの薄いファサードとして提供します。この Java バインディングが唯一の JNI ネイティブブリッジ（pdf_oxide_jni クレート）を所有します。Kotlin モジュールが追加するネイティブコードはゼロです。Java の型（PdfDocument、Pdf、PdfPage、DocumentEditor、PdfSigner、PdfValidator、AutoExtractor、およびジオメトリ / テキスト / テーブル / 検索の値型）を再エクスポートし、Kotlin らしい糖衣構文 — Optional<T> から T? への拡張関数と、AutoCloseable ハンドルに対する use { } — を重ねています。

// build.gradle.kts
dependencies {
    implementation("fyi.oxide:pdf-oxide-kotlin:0.3.69")
}

JNI ネイティブライブラリ（libpdf_oxide_jni）は同梱されていません。System.loadLibrary("pdf_oxide_jni") で読み込むか（.so/.dylib を java.library.path 上に、または Android では jniLibs/<abi>/ に配置）、-Dfyi.oxide.pdf.lib.path=<path> で Java の NativeLoader にパスを指定してください。

Java API については Java API リファレンスを参照してください。Rust API については Rust API リファレンスを参照してください。型の詳細については型と列挙型を参照してください。

import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.producerOrNull

Pdf.fromMarkdown("# Hello\n\nbody\n").use { pdf ->
    PdfDocument.open(pdf.save()).use { doc ->
        println(doc.pageCount())
        println(doc.extractText(0))
        println(doc.toMarkdown())
        println(doc.page(0).words().map { it.text() })
        println(doc.producerOrNull() ?: "(no producer)")   // Optional -> nullable
    }
}

すべてのハンドル（PdfDocument、Pdf、DocumentEditor）は AutoCloseable を実装しているため、Kotlin の use { } ブロックがネイティブメモリを確定的にクローズします。エラーは PdfException（およびそのサブクラス）として送出されます。例外を参照してください。

PdfDocument

PDF への主要な読み取り専用エントリポイント — オープン、抽出、変換、レンダリング、検索、フォームフィールドの調査を行います。インスタンスはネイティブメモリを所有するため、必ずクローズする必要があります。use { } を利用してください。

import fyi.oxide.pdf.PdfDocument

ファクトリメソッド

PdfDocument.open(path: Path): PdfDocument

ファイルシステムのパスから PDF を開きます。

PdfDocument.open(path: String): PdfDocument

パス文字列から PDF を開きます。

PdfDocument.open(bytes: ByteArray): PdfDocument

メモリ上のバイト列（S3 からダウンロードした、HTTP で受信したなど）から PDF を開きます。

PdfDocument.open(path: Path, password: String): PdfDocument

暗号化された PDF を、ユーザーパスワードまたはオーナーパスワードを使ってパスから開きます。

PdfDocument.open(path: String, password: String): PdfDocument

暗号化された PDF を、パスワードを使ってパス文字列から開きます。

PdfDocument.open(bytes: ByteArray, password: String): PdfDocument

暗号化された PDF を、パスワードを使ってバイト列から開きます。

PdfDocument.open(stream: InputStream): PdfDocument

InputStream から全バイトを読み取って PDF を開きます。

静的ワンショット

PdfDocument.extractText(path: String): String
PdfDocument.extractText(path: Path): String

オープン・全テキスト抽出・クローズを 1 回の呼び出しで実行します — ライブハンドルが不要な単純なケース向けです。

認証

doc.authenticate(password: String): Boolean
doc.authenticate(password: ByteArray): Boolean

オープン後に暗号化されたドキュメントを認証します。パスワードが一致した場合は true を返します。

ドキュメント情報

doc.pageCount(): Int

ドキュメントのページ数です。

doc.producer(): Optional<String>
doc.creator(): Optional<String>

ドキュメントの /Producer および /Creator メタデータです。null ベースのアクセスには Kotlin の producerOrNull() / creatorOrNull() 拡張関数を使用してください。

val doc.isOpen: Boolean

ネイティブハンドルがまだ開いているかどうか（Java の isOpen() ゲッターをラップした Kotlin プロパティ）。

テキスト抽出

doc.extractText(pageIndex: Int): String

0 始まりの単一ページからプレーンテキストを抽出します。

doc.extractTextAuto(pageIndex: Int): String

戦略を自動選択してテキストを抽出します（OCR 機能が利用可能な場合、スキャンページに対しては OCR にフォールバックします）。

doc.extractStructured(page: Int): String

ページのテキストとレイアウトの構造化（JSON）表現を抽出します。

変換

doc.toMarkdown(): String
doc.toMarkdown(pageIndex: Int): String

ドキュメント全体または単一ページを Markdown に変換します。

doc.toHtml(): String
doc.toHtml(pageIndex: Int): String

ドキュメント全体または単一ページを HTML に変換します。

検索

doc.search(query: String): List<SearchMatch>

ドキュメントをリテラル文字列で検索します。境界ボックス付きのページごとのマッチを返します。

doc.search(query: String, caseInsensitive: Boolean, regex: Boolean, maxResults: Int): List<SearchMatch>

大文字小文字の区別、正規表現、結果数の上限（maxResults = 0 は上限なしを意味します）を指定して検索します。

フォーム

doc.formFields(): List<FormField>

すべての AcroForm フィールドを、その種類・値・ウィジェットの境界・ページインデックスとともに取得します。FormField を参照してください。

レンダリング

doc.render(pageIndex: Int): ByteArray
doc.render(pageIndex: Int, dpi: Int): ByteArray

ページをデフォルト DPI または指定 DPI で PNG 画像バイト列にレンダリングします。

ページアクセス

doc.page(index: Int): PdfPage

指定した 0 始まりインデックスに対する遅延評価の PdfPage ハンドルを取得します。

doc.pages(): List<PdfPage>

全ページをリストとして取得します。

doc.pagesStream(): Stream<PdfPage>

全ページを、流暢な処理のための Java Stream として取得します。

ライフサイクル

doc.close()

ネイティブメモリを解放します。冪等で、2 回目の呼び出しは何もしません。use { } の利用を推奨します。

PdfPage

PdfDocument.page()、pages()、pagesStream() から返される遅延評価のページハンドルです。すべてのアクセサはアクセス時に親ドキュメントへディスパッチします。

PdfDocument.open(bytes).use { doc ->
    val page = doc.page(0)
    val words = page.words()
    val tables = page.tables()
}

ジオメトリ

page.parent(): PdfDocument
page.index(): Int
page.mediaBox(): BBox
page.cropBox(): BBox
page.width(): Double
page.height(): Double
page.rotation(): Int

親ドキュメント、0 始まりインデックス、MediaBox / CropBox の矩形、PDF ポイント単位の寸法、および度単位のページ回転です。

コンテンツ抽出

page.text(): String

ページ上の全テキストを抽出します。

page.text(region: BBox): String

境界ボックス領域内のテキストを抽出します。

page.words(): List<TextWord>
page.lines(): List<TextLine>
page.chars(): List<TextChar>

単語・行・文字の粒度の構造化テキストです。

page.images(): List<ExtractedImage>
page.tables(): List<Table>
page.annotations(): List<Annotation>

抽出された画像、検出されたテーブル、ページの注釈です。

Pdf

ソース形式から PDF を作成し、ブックマークで分割し、シリアライズします。AutoCloseable を実装します。

import fyi.oxide.pdf.Pdf

ファクトリメソッド

Pdf.fromMarkdown(markdown: String): Pdf

Markdown コンテンツから PDF を作成します。

Pdf.fromHtml(html: String): Pdf

HTML コンテンツから PDF を作成します。

Pdf.fromImages(images: List<ByteArray>): Pdf

画像バイト配列のリストから複数ページの PDF を作成します（画像 1 枚につき 1 ページ）。

分割

pdf.planSplitByBookmarks(opts: SplitByBookmarksOptions): List<BookmarkSegment>

出力を生成せずにアウトラインのブックマークによる分割を計画します — 作成されるであろうセグメント（タイトル、ページ範囲、ファイル名）を返します。

pdf.splitByBookmarks(opts: SplitByBookmarksOptions): List<ByteArray>

ブックマークのレベルで複数の PDF に分割します。セグメントごとに 1 つのバイト配列を返します。

Pdf.planSplitByBookmarksCount(sourcePdf: ByteArray, level: Int): Int

静的ヘルパー: 指定レベルでのブックマーク分割が生成するセグメント数をカウントします。

Pdf.splitByBookmarksFromBytes(sourcePdf: ByteArray, level: Int): Array<ByteArray>

静的ヘルパー: ソース PDF のバイト列をブックマークのレベルで直接分割します。

保存

pdf.save(): ByteArray

PDF をバイト列にシリアライズします。

pdf.saveTo(out: Path)

PDF をファイルに書き出します。

val pdf.isOpen: Boolean
pdf.close()

ライフサイクル（Kotlin の isOpen プロパティと close()）です。use { } の利用を推奨します。

DocumentEditor

墨消し、フォーム入力、メタデータの除去、増分保存を行う変更用エディタです。AutoCloseable を実装します。セッターメソッドは流暢なチェーンのために this を返します。

import fyi.oxide.pdf.DocumentEditor

ファクトリメソッド

DocumentEditor.open(path: Path): DocumentEditor
DocumentEditor.open(path: String): DocumentEditor
DocumentEditor.open(bytes: ByteArray): DocumentEditor

パスまたはメモリ上のバイト列から、編集用にドキュメントを開きます。

フォーム入力

editor.setFormField(name: String, value: String): DocumentEditor

完全修飾名でテキスト / 選択フィールドの値を設定します。

editor.setFormField(name: String, checked: Boolean): DocumentEditor

名前でチェックボックス / ラジオフィールドの状態を設定します。

墨消し

editor.addRedaction(pageIndex: Int, region: BBox): DocumentEditor

ページ上の矩形領域に対する墨消しをキューに追加します。

editor.redactionCount(pageIndex: Int): Int
editor.redactionCount(): Int

ページ単位、またはドキュメント全体でキューに入っている墨消しの数です。

editor.applyRedactionsDestructive(): RedactResult

キューに入っているすべての墨消しを恒久的に適用し、下層のコンテンツを除去します。適用件数とオラクル検証ステータスを持つ RedactResult を返します。

メタデータ

editor.scrubMetadata(): DocumentEditor

プライバシー保護のため、ドキュメントのメタデータ（Info 辞書、XMP）を除去します。

保存

editor.save(): ByteArray
editor.saveTo(out: Path)

編集済みドキュメントを完全な再書き込みでシリアライズします。

editor.saveIncremental(): ByteArray
editor.saveIncrementalTo(out: Path)

増分更新を使ってシリアライズします（変更を追記し、元のバイト列を保持します）。

val editor.isOpen: Boolean
editor.close()

ライフサイクルです。use { } の利用を推奨します。

AutoExtractor

ページを分類し（テキストレイヤー対スキャン）、必要に応じて OCR を適用し、信頼度スコア付きでテキスト / Markdown / HTML を出力する適応型抽出パイプラインです。

import fyi.oxide.pdf.AutoExtractor

ファクトリメソッド

AutoExtractor.of(doc: PdfDocument): AutoExtractor
AutoExtractor.of(doc: PdfDocument, config: AutoExtractConfig): AutoExtractor

ドキュメントに対する抽出器を作成します。必要に応じてカスタムの AutoExtractConfig を指定できます。

AutoExtractor.fast(doc: PdfDocument): AutoExtractor
AutoExtractor.balanced(doc: PdfDocument): AutoExtractor
AutoExtractor.highFidelity(doc: PdfDocument): AutoExtractor

速度と忠実度をトレードオフするプリセット構成です。

抽出

extractor.extractText(): String
extractor.extractTextForPage(pageIndex: Int): String

ドキュメント全体または単一ページのプレーンテキスト抽出です。

extractor.extractDocument(): AutoResult
extractor.extractPage(pageIndex: Int): AutoResult

AutoResult（テキスト、任意の Markdown/HTML、理由、信頼度、OCR フラグ、領域）を返す完全な適応型抽出です。

extractor.extractAutoDocument(): AutoResult
extractor.extractAutoPage(pageIndex: Int): AutoResult

ドキュメントレベルおよびページレベルの抽出の auto モード版です。

extractor.extractDocumentJson(): String
extractor.extractPageJson(pageIndex: Int): String

JSON 文字列としてシリアライズされた抽出結果です。

分類

extractor.classifyDocument(): ClassifyResult
extractor.classifyPage(pageIndex: Int): ClassifyResult

ドキュメントまたはページを分類し、ClassifyResult（ページごとのクラスに加えて、OCR が必要なページ・チャートを含むページ・暗号化されたページのリスト）を返します。

extractor.classifyPageKind(pageIndex: Int): PageClass
extractor.classifyDocumentKinds(): List<PageClass>

ページまたは全ページの PageClass（TEXT_LAYER / SCANNED / MIXED）を取得します。

アクセサ

extractor.document(): PdfDocument
extractor.config(): AutoExtractConfig

ラップされたドキュメントとアクティブな構成です。

MarkdownConverter

PdfDocument から Markdown または HTML への、ステートレスでスレッドセーフなコンバータです。

import fyi.oxide.pdf.MarkdownConverter

MarkdownConverter.toMarkdown(doc: PdfDocument): String
MarkdownConverter.toMarkdown(doc: PdfDocument, pageIndex: Int): String
MarkdownConverter.toHtml(doc: PdfDocument): String
MarkdownConverter.toHtml(doc: PdfDocument, pageIndex: Int): String

ドキュメント全体または単一ページを Markdown / HTML に変換します。

PdfSigner

PKCS#12 キーストアを使って PDF にデジタル署名し、検証します（PAdES B-B / B-T / B-LT レベル）。

import fyi.oxide.pdf.PdfSigner

PdfSigner.fromPkcs12(keystore: Path, password: String): PdfSigner
PdfSigner.fromPkcs12(keystoreBytes: ByteArray, password: String): PdfSigner

ディスク上またはメモリ上の PKCS#12 キーストアから署名者を読み込みます。

signer.sign(pdf: ByteArray, opts: SignOptions): ByteArray

指定した SignOptions（レベル、理由、場所、連絡先、TSA URL）で PDF バイト列に署名します。署名済みの PDF を返します。

signer.verify(pdf: ByteArray): Boolean

PDF 内のすべての署名を検証します。すべての署名が暗号学的に有効な場合は true を返します。

PdfSigner.classifyLevel(pdf: ByteArray): SignatureLevel

静的ヘルパー: 既存の署名済み PDF の PAdES 準拠レベルを検出します。

PdfValidator

PDF/A、PDF/X、PDF/UA の準拠レベルに対する、ステートレスでスレッドセーフな検証を行います。

import fyi.oxide.pdf.PdfValidator

PdfValidator.isPdfA(doc: PdfDocument, level: PdfALevel): Boolean
PdfValidator.isPdfUa(doc: PdfDocument, level: PdfUaLevel): Boolean

ブール値による簡易な準拠チェックです。

PdfValidator.validatePdfA(doc: PdfDocument, level: PdfALevel): ValidationResult
PdfValidator.validatePdfX(doc: PdfDocument, level: PdfXLevel): ValidationResult
PdfValidator.validatePdfUa(doc: PdfDocument, level: PdfUaLevel): ValidationResult

違反のリストを持つ ValidationResult を返す完全な検証です。

PdfPolicy

どの暗号アルゴリズムを許可するかを管理する、グローバルなセキュリティポリシー制御です。

import fyi.oxide.pdf.PdfPolicy

PdfPolicy.current(): PolicyMode
PdfPolicy.set(mode: PolicyMode)
PdfPolicy.compat(): PolicyMode
PdfPolicy.strict(): PolicyMode
PdfPolicy.fipsStrict(): PolicyMode

アクティブな PolicyMode を読み取りまたは設定し、組み込みの compat / strict / FIPS-strict モードを取得します。

Kotlin 拡張関数

Kotlin ファサードが追加する唯一の機能面: Optional<T> から T? へのコンバータと、汎用の orNull() ヘルパーです。fyi.oxide.pdf からインポートします。

fun <T : Any> Optional<T>.orNull(): T?

汎用: 空の Optional は null になります。

fun PdfDocument.producerOrNull(): String?
fun PdfDocument.creatorOrNull(): String?

ドキュメントの /Producer および /Creator、または存在しない場合は null です。

fun FormField.valueOrNull(): String?
fun FormField.bboxOrNull(): BBox?

フォームフィールドの値とウィジェットの境界ボックス、または null です。

fun Annotation.contentsOrNull(): String?
fun Annotation.uriOrNull(): String?

注釈の /Contents とリンク先 URI、または null です。

fun AutoResult.markdownOrNull(): String?
fun AutoResult.htmlOrNull(): String?

自動抽出の Markdown / HTML レンダリング、または生成されなかった場合は null です。

fun ValidationViolation.pageIndexOrNull(): Int?

違反が適用されるページインデックス、またはドキュメントレベルのルールの場合は null です。

ジオメトリ型

BBox

PDF ポイント単位の軸並行境界ボックスです。

BBox(x0: Double, y0: Double, x1: Double, y1: Double)

アクセサ	型	説明
`x0()`, `y0()`, `x1()`, `y1()`	`Double`	角の座標
`width()`	`Double`	`x1 - x0`
`height()`	`Double`	`y1 - y0`

Color

名前付き定数 Color.BLACK、Color.WHITE、Color.TRANSPARENT を持つ 8 ビット RGBA カラーです。

Color(r: Int, g: Int, b: Int, a: Int)
Color(r: Int, g: Int, b: Int)            // a = 255

アクセサ: r(): Int、g(): Int、b(): Int、a(): Int。

Point

Point(x: Double, y: Double)

アクセサ: x(): Double、y(): Double。

Rect

位置とサイズで表す矩形です。

Rect(x: Double, y: Double, width: Double, height: Double)

アクセサ: x()、y()、width()、height()（すべて Double）、および toBBox(): BBox。

テキスト型

TextChar

抽出された 1 文字です。

TextChar(codepoint: Int, bbox: BBox, confidence: Float)

アクセサ: codepoint(): Int、bbox(): BBox、confidence(): Float、asString(): String。

TextWord

TextWord(text: String, bbox: BBox, confidence: Float)

アクセサ: text(): String、bbox(): BBox、confidence(): Float。

TextLine

TextLine(text: String, bbox: BBox, words: List<TextWord>)

アクセサ: text(): String、bbox(): BBox、words(): List<TextWord>。

TextSpan

同一スタイルのテキストの連なりです。

TextSpan(text: String, bbox: BBox, style: TextStyle)

アクセサ: text(): String、bbox(): BBox、style(): TextStyle。

TextStyle

TextStyle(font: String?, size: Double, color: Color, bold: Boolean, italic: Boolean)

アクセサ: font(): String?、size(): Double、color(): Color、bold(): Boolean、italic(): Boolean。

テーブル型

Table

Table(bbox: BBox, rows: Int, cols: Int, cells: List<TableCell>)

アクセサ: bbox(): BBox、rows(): Int、cols(): Int、cells(): List<TableCell>。

TableCell

TableCell(text: String, bbox: BBox, row: Int, col: Int, rowSpan: Int, colSpan: Int)

アクセサ: text(): String、bbox(): BBox、row(): Int、col(): Int、rowSpan(): Int、colSpan(): Int。

検索型

SearchMatch

SearchMatch(pageIndex: Int, bbox: BBox, text: String)

アクセサ: pageIndex(): Int、bbox(): BBox、text(): String。

SearchResult

SearchResult(query: String, matches: List<SearchMatch>)

アクセサ: query(): String、matches(): List<SearchMatch>、count(): Int、isEmpty(): Boolean。

SearchOptions

流暢なビルダーで構築する不変のオプションです。SearchOptions.DEFAULT がデフォルトのインスタンスです。

SearchOptions.builder()
    .withCaseSensitive(true)
    .withWholeWord(true)
    .withRegex(false)
    .withMaxResults(50)
    .build()

アクセサ: caseSensitive(): Boolean、wholeWord(): Boolean、regex(): Boolean、maxResults(): Optional<Int>。ビルダーメソッド: withCaseSensitive(Boolean)、withWholeWord(Boolean)、withRegex(Boolean)、withMaxResults(Int) / withMaxResults(Int?)、build()。

注: 現時点では PdfDocument.search() には接続されていません。代わりに上記の caseInsensitive/regex/maxResults を取るオーバーロードを使用してください。

フォーム型

FormField

FormField(name: String, type: FormFieldType, value: String?, bbox: BBox?, pageIndex: Int)

アクセサ: name(): String、type(): FormFieldType、value(): Optional<String>、bbox(): Optional<BBox>、pageIndex(): Int。null ベースのアクセスには valueOrNull() / bboxOrNull() を使用してください。

注釈型

Annotation

Annotation(type: AnnotationType, pageIndex: Int, bbox: BBox, contents: String?, uri: String?)

アクセサ: type(): AnnotationType、pageIndex(): Int、bbox(): BBox、contents(): Optional<String>、uri(): Optional<String>。null ベースのアクセスには contentsOrNull() / uriOrNull() を使用してください。

画像型

ExtractedImage

ExtractedImage(bytes: ByteArray, format: ImageFormat, bbox: BBox, width: Int, height: Int)

アクセサ: bytes(): ByteArray、format(): ImageFormat、bbox(): BBox、width(): Int、height(): Int。

自動抽出型

AutoResult

適応型抽出の結果です。

result.text(): String
result.markdown(): Optional<String>
result.html(): Optional<String>
result.reason(): ExtractReason
result.confidence(): Double
result.ocrUsed(): Boolean
result.regions(): List<RegionResult>
result.pagesNeedingOcr(): List<Int>

レンダリングされた出力への null ベースのアクセスには markdownOrNull() / htmlOrNull() を使用してください。

RegionResult

AutoResult 内の領域ごとの抽出詳細です。

region.pageIndex(): Int
region.bbox(): BBox
region.text(): String
region.reason(): ExtractReason
region.confidence(): Double
region.ocrUsed(): Boolean
region.table(): Optional<Table>

ClassifyResult

result.pages(): List<PageClass>
result.pagesNeedingOcr(): List<Int>
result.pagesWithChart(): List<Int>
result.pagesEncrypted(): List<Int>

AutoExtractConfig

流暢なビルダーで構築する不変の構成です。AutoExtractConfig.DEFAULT がデフォルトです。既存の構成を toBuilder() でビルダーに戻せます。

AutoExtractConfig.builder()
    .withMode(ExtractMode.AUTO)
    .withForceOcrPages(listOf(2, 5))
    .withMinOcrConfidence(0.6)
    .withOcrLanguages("eng", "deu")
    .withPasswords("secret")
    .withTopMarginFraction(0.05)
    .withBottomMarginFraction(0.05)
    .withAllowSingleColumnTables(true)
    .withOcrInlineImages(false)
    .withCancelToken("token-id")
    .build()

アクセサは各フィールドに対して Optional<...> を返します: mode()、forceOcrPages()、minOcrConfidence()、ocrLanguages()、passwords()、topMarginFraction()、bottomMarginFraction()、allowSingleColumnTables()、ocrInlineImages()、cancelToken()。ビルダーのセッターはボックス化された null 許容版とプリミティブ版の両方のオーバーロードを受け付けます（例: withMinOcrConfidence(Double?) と withTopMarginFraction(double)）。加えて withOcrLanguages(vararg String) / withPasswords(vararg String) の可変長引数形式も用意されています。

コンプライアンス型

ValidationResult

ValidationResult(valid: Boolean, violations: List<ValidationViolation>)

アクセサ: valid(): Boolean、violations(): List<ValidationViolation>。

ValidationViolation

ValidationViolation(ruleId: String, description: String, pageIndex: Int?)

アクセサ: ruleId(): String、description(): String、pageIndex(): Optional<Int>。null ベースのアクセスには pageIndexOrNull() を使用してください。

メタデータ型

DocumentInfo

DocumentInfo(/* title, author, subject, keywords, creator, producer, creationDate, modificationDate */)

アクセサはすべて Optional<String> を返します: title()、author()、subject()、keywords()、creator()、producer()、creationDate()、modificationDate()。

XmpMetadata

生の XMP パケットです。XmpMetadata.EMPTY が空のインスタンスです。

XmpMetadata(xml: String)

アクセサ: xml(): String、isEmpty(): Boolean。

セキュリティ・墨消し型

SecurityPolicy

流暢なビルダーで構築する不変のポリシーです。

SecurityPolicy.builder()
    .withMode(PolicyMode.STRICT)
    .allow("algorithm-id")
    .deny("algorithm-id")
    .build()

アクセサ: mode(): PolicyMode、additionalAllow(): List<String>、additionalDeny(): List<String>。ビルダーメソッド: withMode(PolicyMode)、allow(String)、deny(String)、build()。

RedactResult

RedactResult(regionsApplied: Int, oracleVerified: Boolean)

アクセサ: regionsApplied(): Int、oracleVerified(): Boolean。

署名型

SignOptions

流暢なビルダーで構築する不変の署名オプションです。

SignOptions.builder()
    .withLevel(SignatureLevel.B_T)
    .withReason("Approved")
    .withLocation("HQ")
    .withContactInfo("ops@example.com")
    .withTsaUrl("https://freetsa.org/tsr")
    .build()

アクセサ: level(): SignatureLevel、reason(): Optional<String>、location(): Optional<String>、contactInfo(): Optional<String>、tsaUrl(): Optional<String>。ビルダーメソッド: withLevel、withReason、withLocation、withContactInfo、withTsaUrl、build()。

分割型

BookmarkSegment

BookmarkSegment(title: String, firstPage: Int, lastPage: Int, filename: String)

アクセサ: title(): String、firstPage(): Int、lastPage(): Int、filename(): String。

SplitByBookmarksOptions

流暢なビルダーで構築する不変のオプションです。

SplitByBookmarksOptions.builder()
    .withLevel(1)
    .withFilenamePrefix("chapter-")
    .build()

アクセサ: level(): Int、filenamePrefix(): Optional<String>。ビルダーメソッド: withLevel(Int)、withFilenamePrefix(String?)、build()。

列挙型

列挙型	値
`FormFieldType`	`TEXT`, `CHECKBOX`, `RADIO`, `CHOICE`
`AnnotationType`	`HIGHLIGHT`, `TEXT`, `LINK`, `STAMP`, `UNDERLINE`, `STRIKEOUT`, `SQUIGGLY`, `FREE_TEXT`, `LINE`, `SQUARE`, `CIRCLE`, `FILE_ATTACHMENT`
`ImageFormat`	`JPEG`, `PNG`, `CCITT`, `RAW`
`ExtractMode`	`TEXT_ONLY`, `AUTO`
`ExtractReason`	`OK`, `SCANNED_NO_TEXT_LAYER`, `GLYPH_MAPPING_MISSING`, `ENCRYPTED_NO_EXTRACT_PERMISSION`, `IMAGE_TABLE_NO_STRUCTURE`, `CHART_NOT_TRANSCRIBED`, `OCR_REQUESTED_BUT_UNAVAILABLE`, `OCR_LOW_CONFIDENCE`, `EMPTY`
`PageClass`	`TEXT_LAYER`, `SCANNED`, `MIXED`
`PixelFormat`	`RGBA_8888`, `RGB_888`, `GRAY_8`, `PNG`
`PolicyMode`	`COMPAT`, `STRICT`
`SignatureLevel`	`B_B`, `B_T`, `B_LT`
`PdfALevel`	`A_1B`, `A_1A`, `A_2B`, `A_2A`, `A_2U`, `A_3B`, `A_3A`, `A_3U`, `A_4`, `A_4E`, `A_4F`
`PdfXLevel`	`X_1A_2001`, `X_1A_2003`, `X_3_2002`, `X_3_2003`, `X_4`, `X_4P`, `X_5G`, `X_5N`, `X_5PG`, `X_6`, `X_6P`, `X_6N`
`PdfUaLevel`	`UA_1`, `UA_2`（各値は `code(): Int` を公開）
`PdfErrorKind`	`PARSE`, `ENCRYPTED`, `PERMISSION`, `IO`, `OCR_UNAVAILABLE`, `SIGNATURE`, `INVALID_STATE`, `UNSUPPORTED`, `OTHER`

例外

すべての失敗は PdfException（非チェック例外）またはその種類別のサブクラスを送出します。kind() アクセサは PdfErrorKind を返します。

import fyi.oxide.pdf.exception.PdfException

try {
    PdfDocument.open(bytes).use { doc ->
        println(doc.extractText(0))
    }
} catch (e: PdfException) {
    println("PDF error [${e.kind()}]: ${e.message}")
}

PdfException(message: String)
PdfException(kind: PdfErrorKind, message: String)
PdfException(kind: PdfErrorKind, message: String, cause: Throwable)

e.kind(): PdfErrorKind

例外	原因
`PdfParseException`	不正または破損した PDF
`PdfEncryptedException`	有効なパスワードなしで開かれた暗号化ドキュメント
`PdfPermissionException`	ドキュメントの権限によってブロックされた操作
`PdfIoException`	下層の I/O 障害
`PdfOcrUnavailableException`	OCR が要求されたが `ocr` 機能が組み込まれていない
`PdfSignatureException`	署名または署名検証の失敗
`PdfInvalidStateException`	現在のハンドル状態では無効な操作
`PdfUnsupportedException`	サポートされていない機能または形式

完全な例

import fyi.oxide.pdf.AutoExtractor
import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.Pdf
import fyi.oxide.pdf.PdfDocument
import fyi.oxide.pdf.geometry.BBox
import fyi.oxide.pdf.producerOrNull

// --- Creation ---
val bytes = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.").use { it.save() }

// --- Extraction ---
PdfDocument.open(bytes).use { doc ->
    println("Pages: ${doc.pageCount()}")
    println("Producer: ${doc.producerOrNull() ?: "(none)"}")

    val page = doc.page(0)
    println("Words: ${page.words().map { it.text() }}")
    println("Tables: ${page.tables().size}")

    // Search with options
    val matches = doc.search("Report", caseInsensitive = true, regex = false, maxResults = 0)
    matches.forEach { m -> println("p${m.pageIndex()} '${m.text()}' @ ${m.bbox()}") }

    // Adaptive extraction
    val result = AutoExtractor.balanced(doc).extractDocument()
    println("confidence=${result.confidence()} ocr=${result.ocrUsed()}")
}

// --- Editing: redact + fill forms ---
DocumentEditor.open(bytes).use { editor ->
    editor.setFormField("name", "Jane Doe")
        .addRedaction(0, BBox(72.0, 700.0, 272.0, 720.0))
        .scrubMetadata()
    val redaction = editor.applyRedactionsDestructive()
    println("Redacted ${redaction.regionsApplied()} regions")
    val out: ByteArray = editor.save()
}

他の言語のバインディング

PDF Oxide はあらゆる主要なエコシステム向けにネイティブバインディングを提供しています： Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, Elixir。

次のステップ

型と列挙型 — すべての共有型と列挙型
Page API リファレンス — バインディング間で一貫したページ単位の反復処理
Kotlin 入門 — チュートリアル