What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Java API リファレンス

PDF Oxide は、Rust コアの上に構築した JNI レイヤーを介してネイティブな Java バインディングを提供します。同梱のネイティブライブラリはクラスロード時に自動的に読み込まれ、Linux、macOS、Windows（x86_64 および ARM64）向けのビルド済みネイティブが付属します。

<dependency>
  <groupId>fyi.oxide</groupId>
  <artifactId>pdf-oxide</artifactId>
  <version>0.3.69</version>
</dependency>

すべてのクラスは fyi.oxide.pdf パッケージとそのサブパッケージ（fyi.oxide.pdf.geometry、fyi.oxide.pdf.text、fyi.oxide.pdf.form など）に配置されています。

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;

ライフサイクル。 PdfDocument、DocumentEditor、Pdf はネイティブメモリを所有し、AutoCloseable を実装しています。常に try-with-resources を使用してください。close() は冪等です。Cleaner がリークしたハンドルを解放するバックストップとして機能しますが、適時のクリーンアップをこれに依存してはいけません。

スレッドセーフティ。 ドキュメントインスタンスはスレッドセーフではありません — ワーカーごとに1つ開いてください。ステートレスな静的ヘルパー（MarkdownConverter、PdfValidator、PdfPolicy）はスレッドセーフです。

Rust API については Rust API リファレンスを、Python API については Python API リファレンスを参照してください。

PdfDocument

PDF への主要な読み取り専用エントリーポイント — 開く、抽出、レンダリング、変換を行います。AutoCloseable を実装しています。

import fyi.oxide.pdf.PdfDocument;
import java.nio.file.Paths;

try (PdfDocument doc = PdfDocument.open(Paths.get("invoice.pdf"))) {
    System.out.println(doc.extractText(0));
}

開く（静的ファクトリメソッド）

static PdfDocument open(Path path)

ファイルシステムのパスから PDF を開きます。

static PdfDocument open(String path)

パス文字列から PDF を開きます。

static PdfDocument open(byte[] bytes)

メモリ上のバイト列（例: S3 や HTTP からダウンロードしたもの）から PDF を開きます。

static PdfDocument open(Path path, String password)

ユーザーパスワードまたはオーナーパスワードを使って、暗号化された PDF をパスから開きます。

static PdfDocument open(String path, String password)

パスワード付きで、暗号化された PDF をパス文字列から開きます。

static PdfDocument open(byte[] bytes, String password)

パスワード付きで、暗号化された PDF をバイト列から開きます。

static PdfDocument open(InputStream stream)

InputStream からすべてのバイトを読み込んで PDF を開きます。

ワンショット静的ヘルパー

static String extractText(String path)

1回の呼び出しで、開く・全テキストを抽出する・閉じるをまとめて実行します（パス文字列）。

static String extractText(Path path)

1回の呼び出しで、開く・全テキストを抽出する・閉じるをまとめて実行します（Path）。

認証

boolean authenticate(String password)

開いた後に、暗号化された PDF を認証します。成功時に true を返します。

boolean authenticate(byte[] password)

生のバイトパスワードで認証します。

ドキュメント情報

int pageCount()

ドキュメント内のページ数を返します。

boolean isOpen()

ドキュメントハンドルがまだ開いている場合は true を返します。

テキスト抽出

String extractText(int pageIndex)

0始まりのインデックスで指定した単一ページから、プレーンテキストを抽出します。

String extractTextAuto(int pageIndex)

ページからテキストを抽出します。スキャンページの場合は自動的に OCR へフォールバックします。

String extractStructured(int page)

構造化されたページコンテンツ（スパン、行、レイアウト）を JSON 文字列として抽出します。

変換

String toMarkdown()

ドキュメント全体を Markdown に変換します。

String toMarkdown(int pageIndex)

単一ページを Markdown に変換します。

String toHtml()

ドキュメント全体を HTML に変換します。

String toHtml(int pageIndex)

単一ページを HTML に変換します。

DOM アクセス

PdfPage page(int index)

指定した0始まりのインデックスに対する遅延評価の PdfPage ハンドルを返します。

レンダリング

byte[] render(int pageIndex)

ページをデフォルト DPI で PNG バイト列にレンダリングします。

byte[] render(int pageIndex, int dpi)

ページを指定した DPI で PNG バイト列にレンダリングします。

ライフサイクル

void close()

基盤となるネイティブハンドルを解放します。冪等です。

DocumentEditor

PDF に対する可変な編集セッション: フォーム入力、墨消し（リダクション）、メタデータのスクラブ、保存を行います。ミューテーターは this を返すため、流暢な（fluent）チェーンが可能です。AutoCloseable を実装しています。

import fyi.oxide.pdf.DocumentEditor;

try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
    editor.setFormField("name", "Jane Doe")
          .setFormField("subscribe", true)
          .saveTo(Paths.get("filled.pdf"));
}

開く（静的ファクトリメソッド）

static DocumentEditor open(Path path)

Path から編集用に PDF を開きます。

static DocumentEditor open(String path)

パス文字列から編集用に PDF を開きます。

static DocumentEditor open(byte[] bytes)

メモリ上のバイト列から編集用に PDF を開きます。

フォームフィールド

DocumentEditor setFormField(String name, String value)

名前を指定してテキストまたは選択肢のフォームフィールド値を設定します。this を返します。

DocumentEditor setFormField(String name, boolean checked)

名前を指定してチェックボックス／ラジオのフォームフィールドを設定します。this を返します。

墨消し（リダクション）

DocumentEditor addRedaction(int pageIndex, BBox region)

ページ上の矩形領域に対する墨消しをキューに追加します。this を返します。

int redactionCount(int pageIndex)

ページ上で保留中の墨消しの件数を返します。

int redactionCount()

ドキュメント全体で保留中の墨消しの合計件数を返します。

RedactResult applyRedactionsDestructive()

キューに入っているすべての墨消しを適用し、覆われたコンテンツを恒久的に削除します。RedactResult を返します。

メタデータ

DocumentEditor scrubMetadata()

ドキュメント情報と XMP メタデータを削除します。this を返します。

保存

byte[] save()

編集済みドキュメントを新しいバイト配列にシリアライズします（完全書き換え）。

void saveTo(Path out)

編集済みドキュメントをファイルに書き込みます（完全書き換え）。

byte[] saveIncremental()

インクリメンタル更新を用いてシリアライズし、元のバイト列を保持します。

void saveIncrementalTo(Path out)

インクリメンタル更新をファイルに書き込みます。

ライフサイクル

boolean isOpen()
void close()

エディタが開いているかを確認し、ネイティブハンドルを解放します。

Pdf

Markdown、HTML、または画像から新しい PDF を作成し、既存の PDF を分割します。AutoCloseable を実装しています。

import fyi.oxide.pdf.Pdf;

try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
    pdf.saveTo(Paths.get("report.pdf"));
}

作成（静的ファクトリメソッド）

static Pdf fromMarkdown(String markdown)

Markdown コンテンツから PDF を作成します。

static Pdf fromHtml(String html)

HTML コンテンツから PDF を作成します。

static Pdf fromImages(List<byte[]> images)

画像1枚につき1ページの複数ページ PDF を作成します（JPEG/PNG バイト列）。

分割

List<BookmarkSegment> planSplitByBookmarks(SplitByBookmarksOptions opts)

出力を書き出すことなく、指定したブックマークレベルでの分割に対する BookmarkSegment のプランを算出します。

List<byte[]> splitByBookmarks(SplitByBookmarksOptions opts)

設定したブックマークレベルで PDF を分割し、セグメントごとに1つのバイト配列を返します。

static int planSplitByBookmarksCount(byte[] sourcePdf, int level)

Pdf を開くことなく、ブックマークレベルでの分割が生成するセグメント数を返します。

static byte[][] splitByBookmarksFromBytes(byte[] sourcePdf, int level)

ソース PDF のバイト列を、指定したブックマークレベルで1回の静的呼び出しで分割します。

保存とライフサイクル

byte[] save()

PDF をバイト配列にシリアライズします。

void saveTo(Path out)

PDF をファイルに書き込みます。

boolean isOpen()
void close()

ハンドルが開いているかを確認し、ネイティブリソースを解放します。

AutoExtractor

各ページを分類（テキストレイヤー対スキャン）し、必要に応じて OCR を適用する適応的抽出。開いている PdfDocument から構築します。

import fyi.oxide.pdf.AutoExtractor;

try (PdfDocument doc = PdfDocument.open("scan.pdf")) {
    AutoExtractor extractor = AutoExtractor.balanced(doc);
    AutoResult result = extractor.extractDocument();
    System.out.println(result.text());
}

構築（静的ファクトリメソッド）

static AutoExtractor of(PdfDocument doc)

デフォルト設定でエクストラクターを作成します。

static AutoExtractor of(PdfDocument doc, AutoExtractConfig config)

明示的な AutoExtractConfig を指定してエクストラクターを作成します。

static AutoExtractor fast(PdfDocument doc)

速度を重視して調整したエクストラクター（テキストレイヤー優先）を作成します。

static AutoExtractor balanced(PdfDocument doc)

速度と忠実度のバランスを取ったプリセットでエクストラクターを作成します。

static AutoExtractor highFidelity(PdfDocument doc)

最大限の忠実度を重視して調整したエクストラクター（積極的な OCR）を作成します。

抽出

String extractText()

ドキュメント全体にわたってプレーンテキストを抽出します。

String extractTextForPage(int pageIndex)

単一ページのプレーンテキストを抽出します。

AutoResult extractDocument()

ドキュメント全体に対して完全な適応的抽出を実行します。AutoResult を返します。

AutoResult extractAutoDocument()

完全なドキュメント結果を返す extractDocument() のエイリアスです。

AutoResult extractPage(int pageIndex)

単一ページに対して適応的抽出を実行します。

AutoResult extractAutoPage(int pageIndex)

単一ページに対する extractPage() のエイリアスです。

分類

ClassifyResult classifyDocument()

抽出を行わずにすべてのページを分類します。ClassifyResult を返します。

ClassifyResult classifyPage(int pageIndex)

単一ページを分類します。

JSON 出力

String extractDocumentJson()

ドキュメント全体を抽出し、結果を JSON にシリアライズします。

String extractPageJson(int pageIndex)

ページを抽出し、結果を JSON にシリアライズします。

アクセサ

PdfDocument document()

基盤となる PdfDocument を返します。

AutoExtractConfig config()

有効な設定を返します。

MarkdownConverter

Markdown と HTML への変換のための、ステートレスでスレッドセーフな静的ヘルパー。

static String toMarkdown(PdfDocument doc, int pageIndex)

単一ページを Markdown に変換します。

static String toMarkdown(PdfDocument doc)

ドキュメント全体を Markdown に変換します。

static String toHtml(PdfDocument doc, int pageIndex)

単一ページを HTML に変換します。

static String toHtml(PdfDocument doc)

ドキュメント全体を HTML に変換します。

PdfSigner

PKCS#12 キーストアを使ったデジタル署名と検証。

import fyi.oxide.pdf.PdfSigner;
import fyi.oxide.pdf.signature.SignOptions;

PdfSigner signer = PdfSigner.fromPkcs12(Paths.get("cert.p12"), "keystore-pw");
byte[] signed = signer.sign(pdfBytes, SignOptions.builder().withReason("Approved").build());

static PdfSigner fromPkcs12(Path keystore, String password)

PKCS#12 キーストアファイルから署名者を読み込みます。

static PdfSigner fromPkcs12(byte[] keystoreBytes, String password)

メモリ上の PKCS#12 キーストアバイト列から署名者を読み込みます。

byte[] sign(byte[] pdf, SignOptions opts)

設定した証明書と SignOptions を使って PDF バイト列に署名します。署名済み PDF を返します。

boolean verify(byte[] pdf)

PDF に埋め込まれた署名を検証します。有効な場合は true を返します。

static SignatureLevel classifyLevel(byte[] pdf)

署名済み PDF の PAdES 署名レベルを分類します。SignatureLevel を返します。

PdfValidator

ステートレスでスレッドセーフな PDF/A、PDF/X、PDF/UA 準拠性検証。

static boolean isPdfA(PdfDocument doc, PdfALevel level)

指定したレベルでの PDF/A 適合性をブール値で簡易チェックします。

static boolean isPdfUa(PdfDocument doc, PdfUaLevel level)

指定したレベルでの PDF/UA 適合性をブール値で簡易チェックします。

static ValidationResult validatePdfA(PdfDocument doc, PdfALevel level)

PDF/A レベルに対して検証します。違反内容を含む ValidationResult を返します。

static ValidationResult validatePdfX(PdfDocument doc, PdfXLevel level)

PDF/X レベルに対して検証します。

static ValidationResult validatePdfUa(PdfDocument doc, PdfUaLevel level)

PDF/UA レベルに対して検証します。

PdfPolicy

どの暗号アルゴリズムを許可するかを制御する、プロセス全体のセキュリティポリシー。スレッドセーフな静的アクセサ。

static PolicyMode current()

現在有効なポリシーモードを返します。

static void set(PolicyMode mode)

プロセス全体のポリシーモードを設定します。

static PolicyMode compat()

寛容な互換モードの定数を返します。

static PolicyMode strict()

厳格（strict）モードの定数を返します。

static PolicyMode fipsStrict()

FIPS 厳格モードの定数を返します。

PdfPage

PdfDocument.page(int) が返す遅延評価のページハンドル。プロパティはアクセス時に親ドキュメントへディスパッチされます。

PdfDocument parent()

所有元のドキュメントを返します。

int index()

0始まりのページインデックスを返します。

BBox mediaBox()

ページの MediaBox を BBox として返します。

BBox cropBox()

ページの CropBox を返します。

double width()
double height()

ページの幅と高さを PDF ポイント単位で返します。

int rotation()

ページの回転角度を度数で返します（0、90、180、270）。

String text()

ページ上のすべてのプレーンテキストを抽出します。

String text(BBox region)

矩形領域内のテキストを抽出します。

List<TextWord> words()

バウンディングボックス付きの単語単位テキスト（TextWord）を返します。

List<TextLine> lines()

行単位のテキスト（TextLine）を返します。

List<TextChar> chars()

文字単位のデータ（TextChar）を返します。

ジオメトリ型

BBox

PDF 座標系における不変な軸並行バウンディングボックス。

BBox(double x0, double y0, double x1, double y1)
double x0()
double y0()
double x1()
double y1()
double width()
double height()

Rect

位置とサイズで表す矩形（原点 + 幅／高さ）。

Rect(double x, double y, double width, double height)
double x()
double y()
double width()
double height()
BBox toBBox()

Point

2次元の点。

Point(double x, double y)
double x()
double y()

Color

8ビット RGBA カラー。

Color(int r, int g, int b, int a)
Color(int r, int g, int b)
int r()
int g()
int b()
int a()

テキスト型

TextChar

位置と OCR 信頼度を持つ、デコードされた単一文字。

TextChar(int codepoint, BBox bbox, float confidence)
int codepoint()
BBox bbox()
float confidence()
String asString()

TextWord

境界と信頼度を持つ、空白で区切られた単語。

TextWord(String text, BBox bbox, float confidence)
String text()
BBox bbox()
float confidence()

TextLine

複数の単語で構成されるテキストの1行。

TextLine(String text, BBox bbox, List<TextWord> words)
String text()
BBox bbox()
List<TextWord> words()

TextSpan

同一スタイルが連続するテキストのまとまり。

TextSpan(String text, BBox bbox, TextStyle style)
String text()
BBox bbox()
TextStyle style()

TextStyle

スパンのフォントおよびスタイルのメタデータ。

TextStyle(String font, double size, Color color, boolean bold, boolean italic)
double size()
Color color()
boolean bold()
boolean italic()

テーブル型

Table

セルグリッドを持つ検出済みのテーブル。

Table(BBox bbox, int rows, int cols, List<TableCell> cells)
BBox bbox()
int rows()
int cols()
List<TableCell> cells()

TableCell

スパン情報を含む単一セル。

TableCell(String text, BBox bbox, int row, int col, int rowSpan, int colSpan)
String text()
BBox bbox()
int row()
int col()
int rowSpan()
int colSpan()

画像型

ExtractedImage

ページから抽出したラスター画像。

ExtractedImage(byte[] bytes, ImageFormat format, BBox bbox, int width, int height)
byte[] bytes()
ImageFormat format()
BBox bbox()
int width()
int height()

ImageFormat（enum）

JPEG、PNG、CCITT、RAW。

検索型

SearchOptions

ビルダーで設定する検索パラメータ。

boolean caseSensitive()
boolean wholeWord()
boolean regex()
Optional<Integer> maxResults()
static SearchOptions.Builder builder()

Builder: withCaseSensitive(boolean)、withWholeWord(boolean)、withRegex(boolean)、withMaxResults(Integer)、withMaxResults(int)、build()。

SearchResult

クエリの完全な結果。

SearchResult(String query, List<SearchMatch> matches)
String query()
List<SearchMatch> matches()
int count()
boolean isEmpty()

SearchMatch

ページと位置を持つ単一のヒット。

SearchMatch(int pageIndex, BBox bbox, String text)
int pageIndex()
BBox bbox()
String text()

フォーム型

FormField

AcroForm のフィールド。

FormField(String name, FormFieldType type, String value, BBox bbox, int pageIndex)
String name()
FormFieldType type()
Optional<String> value()
Optional<BBox> bbox()
int pageIndex()

FormFieldType（enum）

TEXT、CHECKBOX、RADIO、CHOICE。

注釈型

Annotation

ページの注釈。

Annotation(AnnotationType type, int pageIndex, BBox bbox, String contents, String uri)
AnnotationType type()
int pageIndex()
BBox bbox()
Optional<String> contents()
Optional<String> uri()

AnnotationType（enum）

HIGHLIGHT、TEXT、LINK、STAMP、UNDERLINE、STRIKEOUT、SQUIGGLY、FREE_TEXT、LINE、SQUARE、CIRCLE、FILE_ATTACHMENT。

メタデータ型

DocumentInfo

標準的なドキュメント情報辞書のフィールド。

Optional<String> title()
Optional<String> author()
Optional<String> subject()
Optional<String> keywords()
Optional<String> creator()
Optional<String> producer()
Optional<String> creationDate()
Optional<String> modificationDate()

XmpMetadata

生の XMP メタデータパケット。

XmpMetadata(String xml)
String xml()
boolean isEmpty()

自動抽出型

AutoExtractConfig

AutoExtractor 用の不変かつビルダーで構築する設定。

Optional<ExtractMode> mode()
Optional<List<Integer>> forceOcrPages()
Optional<Double> minOcrConfidence()
Optional<List<String>> ocrLanguages()
Optional<List<String>> passwords()
Optional<Double> topMarginFraction()
Optional<Double> bottomMarginFraction()
Optional<Boolean> allowSingleColumnTables()
Optional<Boolean> ocrInlineImages()
Optional<String> cancelToken()
static AutoExtractConfig.Builder builder()
AutoExtractConfig.Builder toBuilder()

Builder メソッド: withMode(ExtractMode)、withForceOcrPages(List<Integer>)、withMinOcrConfidence(Double)、withOcrLanguages(List<String>)、withOcrLanguages(String...)、withPasswords(List<String>)、withPasswords(String...)、withTopMarginFraction(Double)、withTopMarginFraction(double)、withBottomMarginFraction(Double)、withBottomMarginFraction(double)、withAllowSingleColumnTables(Boolean)、withAllowSingleColumnTables(boolean)、withOcrInlineImages(Boolean)、withOcrInlineImages(boolean)、withCancelToken(String)、build()。

AutoResult

適応的抽出の結果。

String text()
Optional<String> markdown()
Optional<String> html()
ExtractReason reason()
double confidence()
boolean ocrUsed()
List<RegionResult> regions()
List<Integer> pagesNeedingOcr()

RegionResult

領域ごとの抽出結果。

int pageIndex()
BBox bbox()
String text()
ExtractReason reason()
double confidence()
boolean ocrUsed()
Optional<Table> table()

ClassifyResult

ページ分類の結果。

List<PageClass> pages()
List<Integer> pagesNeedingOcr()
List<Integer> pagesWithChart()
List<Integer> pagesEncrypted()

ExtractMode（enum）

TEXT_ONLY、AUTO。

PageClass（enum）

TEXT_LAYER、SCANNED、MIXED。

ExtractReason（enum）

OK、SCANNED_NO_TEXT_LAYER、GLYPH_MAPPING_MISSING、ENCRYPTED_NO_EXTRACT_PERMISSION、IMAGE_TABLE_NO_STRUCTURE、CHART_NOT_TRANSCRIBED、OCR_REQUESTED_BUT_UNAVAILABLE、OCR_LOW_CONFIDENCE、EMPTY。

署名型

SignOptions

PdfSigner 用にビルダーで構築する署名パラメータ。

SignatureLevel level()
Optional<String> reason()
Optional<String> location()
Optional<String> contactInfo()
Optional<String> tsaUrl()
static SignOptions.Builder builder()

Builder: withLevel(SignatureLevel)、withReason(String)、withLocation(String)、withContactInfo(String)、withTsaUrl(String)、build()。

SignatureLevel（enum）

PAdES ベースラインレベル: B_B（基本）、B_T（信頼できるタイムスタンプ付き）。

分割型

SplitByBookmarksOptions

ブックマークベースの分割用にビルダーで構築するオプション。

int level()
Optional<String> filenamePrefix()
static SplitByBookmarksOptions.Builder builder()

Builder: withLevel(int)、withFilenamePrefix(String)、build()。

BookmarkSegment

ブックマーク分割によって計画された出力セグメント。

BookmarkSegment(String title, int firstPage, int lastPage, String filename)
String title()
int firstPage()
int lastPage()
String filename()

墨消し型

RedactResult

DocumentEditor.applyRedactionsDestructive() の結果。

RedactResult(int regionsApplied, boolean oracleVerified)
int regionsApplied()
boolean oracleVerified()

コンプライアンス型

ValidationResult

PdfValidator のチェック結果。

ValidationResult(boolean valid, List<ValidationViolation> violations)
boolean valid()
List<ValidationViolation> violations()

ValidationViolation

単一の適合性違反。

ValidationViolation(String ruleId, String description, Integer pageIndex)
String ruleId()
String description()
Optional<Integer> pageIndex()

PdfALevel（enum）

A_1B、A_1A、A_2B、A_2A、A_2U、A_3B、A_3A、A_3U、A_4、A_4E。

PdfXLevel（enum）

X_1A_2001、X_1A_2003、X_3_2002、X_3_2003、X_4、X_4P、X_5G、X_5N、X_5PG、X_6、X_6P。

PdfUaLevel（enum）

UA_1、UA_2 — それぞれ int code() を公開します。

ポリシー型

PolicyMode（enum）

COMPAT、STRICT。

SecurityPolicy

操作ごとのセキュリティポリシーをビルダーで構築する型。

PolicyMode mode()
List<String> additionalAllow()
List<String> additionalDeny()
static SecurityPolicy.Builder builder()

Builder: withMode(PolicyMode)、allow(String algId)、deny(String algId)、build()。

レンダリング型

PixelFormat（enum）

RGBA_8888、RGB_888、GRAY_8。

エラー処理

PDF 固有の失敗はすべて PdfException（非チェックの RuntimeException）またはそのサブクラスをスローします。すべての例外は PdfErrorKind kind() を保持します。

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.exception.PdfException;

try (PdfDocument doc = PdfDocument.open("file.pdf")) {
    String text = doc.extractText(0);
} catch (PdfException e) {
    System.err.println(e.kind() + ": " + e.getMessage());
}

PdfException

PdfException(String message)
PdfException(PdfErrorKind kind, String message)
PdfException(PdfErrorKind kind, String message, Throwable cause)
PdfErrorKind kind()

サブクラス

例外	スローされる条件
`PdfParseException`	ファイルが不正な形式、または有効な PDF でない
`PdfEncryptedException`	PDF が暗号化されており、パスワードが未指定または不正である
`PdfPermissionException`	要求された操作がドキュメントの権限により拒否された
`PdfIoException`	基盤となる I/O エラーが発生した
`PdfOcrUnavailableException`	OCR が要求されたが、OCR バックエンドが利用できない
`PdfSignatureException`	署名または検証の操作が失敗した
`PdfInvalidStateException`	閉じられた、または無効なハンドルに対して操作が呼び出された
`PdfUnsupportedException`	要求された機能がサポートされていない

PdfErrorKind（enum）

PARSE、ENCRYPTED、PERMISSION、IO、OCR_UNAVAILABLE、SIGNATURE、INVALID_STATE、UNSUPPORTED。

完全な例

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;
import fyi.oxide.pdf.AutoExtractor;
import fyi.oxide.pdf.auto.AutoResult;
import java.nio.file.Paths;

public class Example {
    public static void main(String[] args) throws Exception {
        // --- Extraction ---
        try (PdfDocument doc = PdfDocument.open(Paths.get("input.pdf"))) {
            System.out.println("Pages: " + doc.pageCount());
            for (int i = 0; i < doc.pageCount(); i++) {
                System.out.println(doc.extractText(i));
            }
            String markdown = doc.toMarkdown();

            // Adaptive extraction with OCR fallback
            AutoResult auto = AutoExtractor.balanced(doc).extractDocument();
            System.out.println("OCR used: " + auto.ocrUsed());
        }

        // --- Creation ---
        try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
            pdf.saveTo(Paths.get("report.pdf"));
        }

        // --- Editing ---
        try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
            editor.setFormField("name", "Jane Doe")
                  .setFormField("subscribe", true)
                  .scrubMetadata()
                  .saveTo(Paths.get("filled.pdf"));
        }
    }
}

他の言語のバインディング

PDF Oxide はあらゆる主要なエコシステム向けにネイティブバインディングを提供しています： Rust, Python, Node.js, WASM, C#, Golang, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, Elixir。

次のステップ

型と列挙型 — すべての共有型と列挙型
Page API リファレンス — バインディング間で一貫したページ単位の反復処理
Java 入門 — チュートリアル