What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

JavaScript API リファレンス

PDF Oxide は JavaScript と TypeScript 向けの WebAssembly バインディングを提供します。npm パッケージ pdf-oxide-wasm は Node.js、ブラウザ、バンドラー、Deno、Cloudflare Workers で動作します。

npm install pdf-oxide-wasm

マルチターゲットパッケージング（v0.3.38）

pdf-oxide-wasm は package.json の条件付きエクスポートを使い、3 つのビルドを同梱して並行提供するようになりました。実行環境に合うサブパスを選んでください。多くの環境では exports フィールドによる自動ルーティングにより、トップレベルの import も正しく解決されます。

サブパス	ターゲット
`pdf-oxide-wasm/nodejs`	Node.js（CommonJS + ESM）
`pdf-oxide-wasm/bundler`	Vite、webpack、Rollup、esbuild、Bun
`pdf-oxide-wasm/web`	ブラウザ、Deno、Cloudflare Workers

// Node.js
import { WasmPdfDocument } from "pdf-oxide-wasm/nodejs";

// Vite / webpack / Rollup
import init, { WasmPdfDocument } from "pdf-oxide-wasm/bundler";
await init();

// Browsers / Deno / Workers
import init, { WasmPdfDocument } from "pdf-oxide-wasm/web";
await init();

これにより、v0.3.38 以前にブラウザ向けバンドラーで発生していた ReferenceError: Can't find variable: __dirname が解消されます。

Rust API については Rust API リファレンス、Python API については Python API リファレンスを参照してください。型の詳細は型と列挙型をご覧ください。

一部のメソッドは Rust のビルドフィーチャー（rendering、signatures、barcodes、ocr-tract）の背後にゲートされています。デフォルトの pdf-oxide-wasm パッケージは一般的なセットを有効化しており、OCR は別の wasm-ocr ビルドで提供されます。機能の対応状況を参照してください。

モジュール関数

パッケージのトップレベルでエクスポートされるフリー関数です。

import {
  setLogLevel, disableLogging,
  generateBarcodeSvg, generateQrSvg,
  planSplitByBookmarks, splitByBookmarks,
  setCryptoPolicy, cryptoPolicy, cryptoInventory, cryptoCbom,
  modelManifest, prefetchAvailable,
  signPdfBytes, signPdfBytesPades, hasDocumentTimestamp,
} from "pdf-oxide-wasm";

ロギング

setLogLevel(level)   // Set log verbosity: "off" | "error" | "warn" | "info" | "debug" | "trace"
disableLogging()     // Silence all log output

バーコード

generateBarcodeSvg(barcodeType, data) -> string  // 1D barcode as SVG; type 0–7 (Code128, Code39, Ean13, Ean8, UpcA, Itf, Code93, Codabar)
generateQrSvg(data, errorCorrection, size) -> string  // QR code as SVG; errorCorrection 0=Low 1=Medium 2=Quartile 3=High

ブックマークによる分割

planSplitByBookmarks(srcBytes, titlePrefix, ignoreCase, level, includeFrontMatter) -> Array  // Plan a split without producing PDFs; returns segment descriptors
splitByBookmarks(srcBytes, titlePrefix, ignoreCase, level, includeFrontMatter) -> Array       // Split at bookmark boundaries; returns [segment, bytes] pairs (level 0=all depths, 1=top-level)

暗号ガバナンス

setCryptoPolicy(spec)   // Install the process-wide crypto policy ("compat" | "strict" | "fips-strict"[;…]); fail-closed
cryptoPolicy() -> string  // The active crypto policy as its canonical grammar string
cryptoInventory() -> string[]  // Algorithm tokens exercised so far this process
cryptoCbom() -> string  // CycloneDX 1.6 Cryptographic Bill of Materials (JSON string)

OCR モデルのプロビジョニング

modelManifest() -> string   // JSON manifest of OCR detector/recognizer cache filenames and source URLs (host-side fetch)
prefetchAvailable() -> boolean  // Whether this build can download OCR models to a local cache (always false in WASM)

署名（フリー関数）

signPdfBytes(pdfData, cert, reason?, location?) -> Uint8Array  // Sign raw PDF bytes with a WasmCertificate; returns the signed PDF
signPdfBytesPades(pdfData, cert, level, timestampToken?, revocation?, reason?, location?) -> Uint8Array  // Sign at a PAdES baseline level (BB/BT/BLt); pass a pre-fetched RFC 3161 token for BT/BLt
hasDocumentTimestamp(pdfData) -> boolean  // Whether the PDF carries a document-scoped /DocTimeStamp (PAdES-B-LTA)

WasmPdfDocument

PDF を開く、抽出する、編集する、保存するための中心的なクラスです。

import { WasmPdfDocument } from "pdf-oxide-wasm";

コンストラクタ

`new WasmPdfDocument(data, password?)`

生のバイト列から PDF ドキュメントを読み込みます。

パラメータ	型	説明
`data`	`Uint8Array`	PDF ファイルの内容
`password`	`string \| undefined`	暗号化された PDF 用のオプションのパスワード

例外: PDF が無効、または解析できない場合は Error をスローします。

const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);

静的コンストラクタ

WasmPdfDocument.openFromDocxBytes(data) -> WasmPdfDocument  // Convert DOCX bytes to a PDF document
WasmPdfDocument.openFromPptxBytes(data) -> WasmPdfDocument  // Convert PPTX bytes to a PDF document
WasmPdfDocument.openFromXlsxBytes(data) -> WasmPdfDocument  // Convert XLSX bytes to a PDF document

コア（読み取り専用）

`pageCount() -> number`

ドキュメントのページ数を取得します。

`version() -> Uint8Array`

PDF バージョンを [major, minor] として取得します。

const [major, minor] = doc.version();
console.log(`PDF ${major}.${minor}`);

`authenticate(password) -> boolean`

暗号化された PDF を復号します。認証に成功した場合は true を返します。

パラメータ	型	説明
`password`	`string`	パスワード文字列

`hasStructureTree() -> boolean`

ドキュメントが構造ツリーを持つタグ付き PDF かどうかを確認します。

署名の検査

signatureCount() -> number          // Number of digital signatures in the document
signatures() -> WasmSignature[]     // Parsed signatures (signer, reason, time, verify())
dss() -> Dss | null                 // Document Security Store (certs/CRLs/OCSP), or null

テキスト抽出

`extractText(pageIndex, region?) -> string`

単一ページからプレーンテキストを抽出します。抽出範囲を限定するため、オプションで [x, y, w, h] の領域を渡せます。

パラメータ	型	説明
`pageIndex`	`number`	0 始まりのページ番号
`region`	`number[] \| undefined`	オプションの `[x, y, width, height]` クリップ

const text = doc.extractText(0);

`extractAllText() -> string`

すべてのページからプレーンテキストを抽出し、改ページ文字（form feed）で区切ります。

`extractStructured(pageIndex) -> string`

ページの構造化された JSON 表現（ブロック、行、スタイル情報）を抽出します。

`extractChars(pageIndex, region?) -> Array`

個々の文字を、正確な位置情報とフォントのメタデータとともに抽出します。

パラメータ	型	説明
`pageIndex`	`number`	0 始まりのページ番号
`region`	`number[] \| undefined`	オプションの `[x, y, width, height]` クリップ

戻り値: 次のフィールドを持つオブジェクトの配列:

フィールド	型	説明
`char`	`string`	文字
`bbox`	`{x, y, width, height}`	バウンディングボックス
`fontName`	`string`	フォント名
`fontSize`	`number`	フォントサイズ（ポイント）
`fontWeight`	`string`	ウェイト（Normal、Bold など）
`isItalic`	`boolean`	イタリックフラグ
`color`	`{r, g, b}`	RGB カラー（0.0–1.0）

const chars = doc.extractChars(0);
for (const c of chars) {
  console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y})`);
}

`extractPageText(pageIndex, readingOrder?) -> object`

単一の抽出パスでスパン、文字、ページ寸法を取得します。extractSpans() と extractChars() を別々に呼ぶよりも効率的です。複数カラムの PDF には "column_aware" を渡してください。

パラメータ	型	説明
`pageIndex`	`number`	0 始まりのページ番号
`readingOrder`	`string \| undefined`	`"column_aware"` または `"top_to_bottom"`（デフォルト）

戻り値: 次のフィールドを持つオブジェクト:

フィールド	型	説明
`spans`	`Array`	スパンオブジェクトの配列
`chars`	`Array`	文字オブジェクトの配列
`pageWidth`	`number`	ページ幅（PDF ポイント）
`pageHeight`	`number`	ページ高さ（PDF ポイント）
`text`	`string`	テキスト全文

const result = doc.extractPageText(0);
console.log(`Page: ${result.pageWidth}x${result.pageHeight} pt`);
for (const span of result.spans) {
  console.log(`'${span.text}' font=${span.fontName} size=${span.fontSize}`);
}

`extractSpans(pageIndex, region?, readingOrder?) -> Array`

フォントのメタデータ付きのスタイル付きテキストスパンを抽出します。複数カラムの PDF には readingOrder として "column_aware" を渡してください。

パラメータ	型	説明
`pageIndex`	`number`	0 始まりのページ番号
`region`	`number[] \| undefined`	オプションの `[x, y, width, height]` クリップ
`readingOrder`	`string \| undefined`	`"column_aware"` または `"top_to_bottom"`（デフォルト）

戻り値: 次のフィールドを持つオブジェクトの配列:

フィールド	型	説明
`text`	`string`	テキストの内容
`bbox`	`{x, y, width, height}`	バウンディングボックス
`fontName`	`string`	フォント名
`fontSize`	`number`	フォントサイズ（ポイント）
`fontWeight`	`string`	ウェイト（Normal、Bold など）
`isItalic`	`boolean`	イタリックフラグ
`isMonospace`	`boolean`	等幅フォントかどうか
`charWidths`	`number[]`	グリフごとの送り幅
`color`	`{r, g, b}`	RGB カラー（0.0–1.0）

const spans = doc.extractSpans(0);
for (const span of spans) {
  console.log(`"${span.text}" size=${span.fontSize}`);
}

単語、行、テーブル

extractWords(pageIndex, region?) -> Array       // Word-level boxes with text + font metadata
extractTextLines(pageIndex, region?) -> Array   // Line-level boxes, each with its words
extractTables(pageIndex, region?) -> Array      // Detected tables with rows/cells (text + bboxes)

ヘッダー／フッターのアーティファクト

繰り返し現れるヘッダー、フッター、ページ装飾（page-furniture）アーティファクトを検出して削除または消去します。

removeHeaders(threshold) -> number     // Remove detected headers across the document; returns count removed
removeFooters(threshold) -> number     // Remove detected footers; returns count removed
removeArtifacts(threshold) -> number   // Remove detected page artifacts; returns count removed
eraseHeader(pageIndex)                 // Queue an erase of the header region on a page
editHeader(pageIndex)                  // Mark the header region for editing on a page
eraseFooter(pageIndex)                 // Queue an erase of the footer region on a page
editFooter(pageIndex)                  // Mark the footer region for editing on a page
eraseArtifacts(pageIndex)              // Queue an erase of detected artifacts on a page

領域抽出

`within(pageIndex, region) -> WasmPdfPageRegion`

以降の抽出をページの矩形領域にスコープします。region は [x, y, width, height] です。WasmPdfPageRegion を参照してください。

const region = doc.within(0, [50, 600, 400, 150]);
const text = region.extractText();

形式変換

`toMarkdown(pageIndex, detectHeadings?, includeImages?, includeFormFields?) -> string`

単一ページを Markdown に変換します。

パラメータ	型	デフォルト	説明
`pageIndex`	`number`	–	0 始まりのページ番号
`detectHeadings`	`boolean`	`true`	フォントサイズから見出しを検出
`includeImages`	`boolean`	`true`	画像を含める
`includeFormFields`	`boolean`	`true`	フォームフィールドの値を含める

`toMarkdownAll(detectHeadings?, includeImages?, includeFormFields?) -> string`

すべてのページを Markdown に変換します。

`toHtml(pageIndex, preserveLayout?, detectHeadings?, includeFormFields?) -> string`

単一ページを HTML に変換します。

パラメータ	型	デフォルト	説明
`pageIndex`	`number`	–	0 始まりのページ番号
`preserveLayout`	`boolean`	`false`	視覚的なレイアウトを保持
`detectHeadings`	`boolean`	`true`	見出しを検出
`includeFormFields`	`boolean`	`true`	フォームフィールドの値を含める

`toHtmlAll(preserveLayout?, detectHeadings?, includeFormFields?) -> string`

すべてのページを HTML に変換します。

`toPlainText(pageIndex) -> string`

単一ページをプレーンテキストに変換します。

`toPlainTextAll() -> string`

すべてのページをプレーンテキストに変換します。

Office ラウンドトリップ

toDocxBytes() -> Uint8Array   // Export the document as a DOCX file
toPptxBytes() -> Uint8Array   // Export the document as a PPTX file
toXlsxBytes() -> Uint8Array   // Export the document as an XLSX file

検索

`search(pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array`

すべてのページを対象にテキストを検索します。

パラメータ	型	デフォルト	説明
`pattern`	`string`	–	検索パターン（文字列または正規表現）
`caseInsensitive`	`boolean`	`false`	大文字・小文字を区別しない検索
`literal`	`boolean`	`false`	パターンをリテラル文字列として扱う
`wholeWord`	`boolean`	`false`	単語全体のみにマッチ
`maxResults`	`number`	`0`	最大結果数（0 = 無制限）

戻り値: 次のフィールドを持つオブジェクトの配列:

フィールド	型	説明
`page`	`number`	ページ番号
`text`	`string`	マッチしたテキスト
`bbox`	`object`	バウンディングボックス
`startIndex`	`number`	ページテキスト内の開始インデックス
`endIndex`	`number`	ページテキスト内の終了インデックス

`searchPage(pageIndex, pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array`

単一ページ内でテキストを検索します。

画像情報

`extractImages(pageIndex) -> Array`

ページの画像メタデータを取得します。

フィールド	型	説明
`width`	`number`	画像幅（ピクセル）
`height`	`number`	画像高さ（ピクセル）
`colorSpace`	`string`	カラースペース（例: `DeviceRGB`）
`bitsPerComponent`	`number`	色チャンネルあたりのビット数
`bbox`	`object`	ページ上の位置

`extractImageBytes(pageIndex) -> Array`

ページから生の画像バイト列を抽出します。次のオブジェクトの配列を返します:

フィールド	型	説明
`width`	`number`	画像幅（ピクセル）
`height`	`number`	画像高さ（ピクセル）
`data`	`Uint8Array`	生の画像バイト列
`format`	`string`	画像形式

`pageImages(pageIndex) -> Array`

位置調整操作のための画像名と境界を取得します。

フィールド	型	説明
`name`	`string`	XObject 名
`bounds`	`number[]`	`[x, y, width, height]`
`matrix`	`number[]`	変換行列 `[a, b, c, d, e, f]`

ベクターコンテンツ

extractPaths(pageIndex, region?) -> Array   // Vector paths (lines, curves, shapes) on a page
extractRects(pageIndex, region?) -> Array   // Axis-aligned rectangles detected from path segments
extractLines(pageIndex, region?) -> Array   // Straight line segments detected from path data

ドキュメント構造

`getOutline() -> Array | null`

ドキュメントのブックマーク／目次を取得します。アウトラインが存在しない場合は null を返します。

`getAnnotations(pageIndex) -> Array`

ページの注釈メタデータ（種類、矩形、内容など）を取得します。

`pageLabels() -> Array`

ページラベルの範囲を取得します。次のオブジェクトの配列を返します:

フィールド	型	説明
`startPage`	`number`	この範囲の最初のページ
`style`	`string`	番号付けのスタイル
`prefix`	`string`	ラベルのプレフィックス
`startValue`	`number`	開始番号

`xmpMetadata() -> object | null`

XMP メタデータを取得します。存在しない場合は null を返します。オブジェクトのフィールドは次のとおりです:

フィールド	型	説明
`dcTitle`	`string \| null`	ドキュメントタイトル
`dcCreator`	`string[] \| null`	作成者リスト
`dcDescription`	`string \| null`	説明
`xmpCreatorTool`	`string \| null`	作成ツール
`xmpCreateDate`	`string \| null`	作成日
`xmpModifyDate`	`string \| null`	更新日
`pdfProducer`	`string \| null`	PDF プロデューサー

フォームフィールド

`getFormFields() -> Array`

すべてのフォームフィールドを、名前、種類、値、フラグとともに取得します。

フィールド	型	説明
`name`	`string`	フィールド名
`fieldType`	`string`	フィールドの種類（text、checkbox など）
`value`	`string`	現在の値
`flags`	`number`	フィールドフラグ

const fields = doc.getFormFields();
for (const f of fields) {
  console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}

`hasXfa() -> boolean`

ドキュメントが XFA フォームを含むかどうかを確認します。

`getFormFieldValue(name) -> any`

名前を指定してフォームフィールドの値を取得します。フィールドの種類に応じて string、boolean、または null を返します。

`setFormFieldValue(name, value) -> void`

名前を指定してフォームフィールドの値を設定します。

パラメータ	型	説明
`name`	`string`	フィールド名
`value`	`string \| boolean`	新しいフィールド値

`exportFormData(format?) -> Uint8Array`

フォームデータを FDF（デフォルト）または XFDF としてエクスポートします。

パラメータ	型	デフォルト	説明
`format`	`string`	`"fdf"`	エクスポート形式: `"fdf"` または `"xfdf"`

フォームのフラット化

flattenForms()                    // Flatten all form fields into page content
flattenFormsOnPage(pageIndex)     // Flatten forms on a specific page
flattenWarnings() -> string[]     // Warnings produced by the last flatten operation

編集

メタデータ

メソッド	パラメータ	説明
`setTitle(title)`	`string`	ドキュメントタイトルを設定
`setAuthor(author)`	`string`	ドキュメントの作成者を設定
`setSubject(subject)`	`string`	ドキュメントのサブジェクトを設定
`setKeywords(keywords)`	`string`	ドキュメントのキーワードを設定

ページ回転

メソッド	パラメータ	説明
`pageRotation(pageIndex)`	`number`	現在の回転角度を取得（0、90、180、270）
`setPageRotation(pageIndex, degrees)`	`number, number`	絶対角度で回転を設定
`rotatePage(pageIndex, degrees)`	`number, number`	現在の回転角度に加算
`rotateAllPages(degrees)`	`number`	すべてのページを回転

ページ寸法

メソッド	パラメータ	説明
`pageMediaBox(pageIndex)`	`number`	MediaBox `[llx, lly, urx, ury]` を取得
`setPageMediaBox(pageIndex, llx, lly, urx, ury)`	`number, ...`	MediaBox を設定
`pageCropBox(pageIndex)`	`number`	CropBox を取得（null の場合あり）
`setPageCropBox(pageIndex, llx, lly, urx, ury)`	`number, ...`	CropBox を設定
`cropMargins(left, right, top, bottom)`	`number, ...`	すべてのページの余白を切り取る

ページ操作

deletePage(index)                 // Delete a page by index
movePage(fromIndex, toIndex)      // Move a page to a new position
extractPages(pages) -> Uint8Array // Build a new PDF from the given page indices

消去／ホワイトアウト

メソッド	パラメータ	説明
`eraseRegion(pageIndex, llx, lly, urx, ury)`	`number, ...`	領域を消去
`eraseRegions(pageIndex, rects)`	`number, Float32Array`	複数の領域を消去
`clearEraseRegions(pageIndex)`	`number`	保留中の消去をクリア

注釈と墨消し（リダクション）

メソッド	パラメータ	説明
`flattenPageAnnotations(pageIndex)`	`number`	ページの注釈をフラット化
`flattenAllAnnotations()`	–	すべての注釈をフラット化
`applyPageRedactions(pageIndex)`	`number`	ページの墨消しを適用
`applyAllRedactions()`	–	すべての墨消しを適用
`addRedaction(page, x0, y0, x1, y1, fill?)`	`number, ...`	墨消しボックスをキュー（オプションで `[r,g,b]` の塗り）
`redactionCount(page)`	`number`	ページにキューされた墨消しの数を取得
`applyRedactionsDestructive(scrubMetadata?)`	`boolean`	コンテンツを破壊的に削除し、墨消しレポートを返す
`sanitizeDocument(scrubMetadata?, removeJavascript?, removeEmbeddedFiles?)`	`boolean, ...`	メタデータ、スクリプト、埋め込みファイルを除去し、レポートを返す

マージと埋め込み

`mergeFrom(data) -> number`

別の PDF からページをマージします。マージしたページ数を返します。

パラメータ	型	説明
`data`	`Uint8Array`	ソースとなる PDF ファイルのバイト列

`embedFile(name, data) -> void`

ファイルを PDF に添付します。

パラメータ	型	説明
`name`	`string`	添付ファイルのファイル名
`data`	`Uint8Array`	ファイルの内容

画像操作

メソッド	パラメータ	説明
`repositionImage(pageIndex, name, x, y)`	`number, string, number, number`	画像を移動
`resizeImage(pageIndex, name, w, h)`	`number, string, number, number`	画像をリサイズ
`setImageBounds(pageIndex, name, x, y, w, h)`	`number, string, ...`	画像の境界を設定

分類と自動抽出

classifyDocument() -> string                 // Classify the whole document (e.g. born-digital vs scanned)
classifyPage(pageIndex) -> string            // Classify a single page
extractTextAuto(pageIndex) -> string         // Auto-pick native vs OCR extraction for a page
extractPageAuto(pageIndex, optionsJson?) -> string  // Auto-extraction returning a structured JSON page

検証

validatePdfA(level) -> object        // Validate against a PDF/A conformance level (e.g. "2b")
convertToPdfA(level) -> object       // Convert toward a PDF/A level; returns a report
validatePdfUa(level?) -> object      // Validate against PDF/UA accessibility
validatePdfX(level?) -> object       // Validate against a PDF/X print level

レンダリング

rendering フィーチャーが必要です。

メソッド	パラメータ	戻り値	説明
`renderPage(pageIndex, dpi?)`	`number, number`	`Uint8Array`	ページを PNG バイト列にレンダリング（デフォルト 150 dpi）
`flattenToImages(dpi?)`	`number`	`Uint8Array`	すべてのページを画像ベースの PDF にフラット化

OCR

wasm-ocr ビルドが必要です。WasmOcrEngine を参照してください。

`extractTextOcr(pageIndex, engine) -> string`

ホスト側で構築した WasmOcrEngine を使い、ページに対して WASM 内 OCR パイプラインを実行します。認識されたテキストを読み取り順で返します。

const text = doc.extractTextOcr(0, engine);

保存

`save() -> Uint8Array`

編集した PDF をバイト列として保存します。エイリアスとして saveToBytes() も利用できます。

`saveWithOptions(compress?, garbageCollect?, linearize?) -> Uint8Array`

明示的なシリアライズオプションを指定して保存します。

パラメータ	型	デフォルト	説明
`compress`	`boolean`	`true`	オブジェクトストリームを圧縮
`garbageCollect`	`boolean`	`true`	参照されていないオブジェクトを破棄
`linearize`	`boolean`	`false`	リニアライズ済み（「高速 Web 表示」）PDF を生成

`saveEncryptedToBytes(password, ownerPassword?, allowPrint?, allowCopy?, allowModify?, allowAnnotate?) -> Uint8Array`

AES-256 暗号化で保存します。

パラメータ	型	デフォルト	説明
`password`	`string`	–	ユーザーパスワード
`ownerPassword`	`string`	user password	オーナーパスワード
`allowPrint`	`boolean`	`true`	印刷を許可
`allowCopy`	`boolean`	`true`	コピーを許可
`allowModify`	`boolean`	`true`	変更を許可
`allowAnnotate`	`boolean`	`true`	注釈を許可

`free()`

WASM メモリを解放します。ドキュメントの使用が終わったら必ず呼び出してください。

WasmPdfPageRegion

WasmPdfDocument.within(pageIndex, region) が返す領域ハンドルです。抽出メソッドは矩形にスコープされます。

extractText() -> string       // Plain text within the region
extractChars() -> Array       // Characters within the region
extractWords() -> Array       // Words within the region
extractTextLines() -> Array   // Text lines within the region
extractTables() -> Array      // Tables within the region
extractImages() -> Array      // Images within the region
extractPaths() -> Array       // Vector paths within the region
extractRects() -> Array       // Rectangles within the region
extractLines() -> Array       // Line segments within the region
extractTextOcr(engine?) -> string  // OCR text within the region (wasm-ocr build)

WasmPdf

新しい PDF を作成するためのファクトリークラスです。

import { WasmPdf } from "pdf-oxide-wasm";

静的メソッド

WasmPdf.fromMarkdown(content, title?, author?) -> WasmPdf  // Create a PDF from Markdown text
WasmPdf.fromHtml(content, title?, author?) -> WasmPdf      // Create a PDF from HTML
WasmPdf.fromText(content, title?, author?) -> WasmPdf      // Create a PDF from plain text
WasmPdf.fromBytes(data) -> WasmPdf                         // Open an existing PDF from bytes for modification
WasmPdf.fromImageBytes(data) -> WasmPdf                    // Single-page PDF from one image (JPEG/PNG)
WasmPdf.fromMultipleImageBytes(imagesArray) -> WasmPdf     // Multi-page PDF, one page per image
WasmPdf.merge(pdfs) -> WasmPdf                             // Merge an array of PDF byte buffers into one
WasmPdf.fromHtmlCss(html, css, fontBytes) -> WasmPdf       // HTML + CSS with a single embedded font
WasmPdf.fromHtmlCssWithFonts(html, css, fonts) -> WasmPdf  // HTML + CSS with multiple [name, bytes] fonts

パラメータ	型	説明
`content`	`string`	ソースコンテンツ（Markdown / HTML / テキスト）
`title`	`string \| undefined`	ドキュメントタイトル
`author`	`string \| undefined`	ドキュメントの作成者
`data`	`Uint8Array`	PDF または画像ファイルのバイト列
`imagesArray`	`Uint8Array[]`	画像ファイルのバイト列の配列
`pdfs`	`Uint8Array[]`	マージする PDF ファイルのバイト列の配列

インスタンスメソッド

`toBytes() -> Uint8Array`

PDF をバイト列として取得します。

`size -> number`

PDF のサイズ（バイト単位、読み取り専用ゲッター）。

const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
console.log(`PDF size: ${pdf.size} bytes`);
writeFileSync("output.pdf", pdf.toBytes());

WasmDocumentBuilder

PDF をページ単位で組み立てるための、フルーエントで低レベルなページレイアウトビルダーです。WasmFluentPageBuilder と組み合わせて使います。

import { WasmDocumentBuilder } from "pdf-oxide-wasm";
const builder = new WasmDocumentBuilder();

ドキュメントのセットアップ

new WasmDocumentBuilder()          // Create an empty builder
title(title)                       // Set document title
author(author)                     // Set document author
subject(subject)                   // Set document subject
keywords(keywords)                 // Set document keywords
creator(creator)                   // Set the creator tool name
onOpen(script)                     // Set a document-level open JavaScript action
taggedPdfUa1()                     // Enable Tagged PDF / PDF/UA-1 output
language(lang)                     // Set the document language (e.g. "en-US")
roleMap(custom, standard)          // Map a custom structure tag to a standard role
registerEmbeddedFont(name, font)   // Register a WasmEmbeddedFont under a name

ページの作成と出力

a4Page() -> WasmFluentPageBuilder         // Start a new A4 page
letterPage() -> WasmFluentPageBuilder     // Start a new US Letter page
page(width, height) -> WasmFluentPageBuilder  // Start a custom-size page (points)
commitPage(page)                          // Commit a completed page builder
build() -> Uint8Array                     // Finish and return the PDF bytes
toBytesEncrypted(userPassword, ownerPassword?) -> Uint8Array  // Finish with AES-256 encryption

WasmFluentPageBuilder

a4Page() / letterPage() / page() が返すページ単位のビルダーです。操作をキューに積んでから、done(builder)（または builder.commitPage(page)）でコミットします。

テキストとフロー

font(name, size)                 // Set the current font and size
at(x, y)                         // Move the cursor to an absolute position
text(text)                       // Draw text at the cursor
heading(level, text)             // Draw a heading (level 1–6)
paragraph(text)                  // Draw a wrapped paragraph
space(points)                    // Advance the cursor vertically
horizontalRule()                 // Draw a horizontal rule
newline()                        // Advance to the next line
columns(columnCount, gapPt, text)  // Lay text out in N balanced columns
footnote(refMark, noteText)      // Add a footnote marker + bottom-of-page note

インラインラン

inline(text)                     // Append an inline text run
inlineBold(text)                 // Append a bold inline run
inlineItalic(text)               // Append an italic inline run
inlineColor(r, g, b, text)       // Append a colored inline run (RGB 0.0–1.0)

リンクとフォームのアクション

linkUrl(url)                     // Wrap the last element in a URL link
linkPage(page)                   // Link to another page index
linkNamed(destination)           // Link to a named destination
linkJavascript(script)           // Attach a JavaScript link action
onOpen(script)                   // Page open action
onClose(script)                  // Page close action
fieldKeystroke(script)           // Keystroke JavaScript for the last field
fieldFormat(script)              // Format JavaScript for the last field
fieldValidate(script)            // Validate JavaScript for the last field
fieldCalculate(script)           // Calculate JavaScript for the last field

マークアップ注釈

highlight(r, g, b)               // Highlight the last text run (RGB 0.0–1.0)
underline(r, g, b)               // Underline the last text run
strikeout(r, g, b)               // Strike out the last text run
squiggly(r, g, b)                // Squiggly-underline the last text run
stickyNote(text)                 // Add a sticky note at the cursor
stickyNoteAt(x, y, text)         // Add a sticky note at an absolute position
stamp(name)                      // Add a rubber-stamp annotation (e.g. "Approved")
freeText(x, y, w, h, text)       // Add a free-text annotation box
watermark(text)                  // Add a text watermark
watermarkConfidential()          // Add a "CONFIDENTIAL" watermark
watermarkDraft()                 // Add a "DRAFT" watermark

AcroForm ウィジェット

textField(name, x, y, w, h, defaultValue?)            // Add a text field
checkbox(name, x, y, w, h, checked)                   // Add a checkbox
comboBox(name, x, y, w, h, options, selected?)        // Add a dropdown combo box
radioGroup(name, values, xs, ys, ws, hs, selected?)   // Add a radio-button group (parallel arrays)
pushButton(name, x, y, w, h, caption)                 // Add a clickable push button
signatureField(name, x, y, w, h)                      // Add an unsigned signature placeholder

バーコードと画像

barcode1d(barcodeType, data, x, y, w, h)   // Draw a 1D barcode (type 0–7)
barcodeQr(data, x, y, size)                // Draw a QR code
imageWithAlt(bytes, x, y, w, h, altText)   // Embed an image with accessibility alt text
imageArtifact(bytes, x, y, w, h)           // Embed a decorative image as an /Artifact

グラフィックスプリミティブ

rect(x, y, w, h)                                  // Stroked 1pt rectangle outline
filledRect(x, y, w, h, r, g, b)                   // Filled rectangle (RGB 0.0–1.0)
line(x1, y1, x2, y2)                              // 1pt black line
strokeRect(x, y, w, h, width, r, g, b)            // Stroked rectangle, explicit width + color
strokeRectDashed(x, y, w, h, width, r, g, b, dash, phase)  // Dashed rectangle border
strokeLine(x1, y1, x2, y2, width, r, g, b)        // Line with explicit width + color
strokeLineDashed(x1, y1, x2, y2, width, r, g, b, dash, phase)  // Dashed line
textInRect(x, y, w, h, text, align)               // Lay text inside a rectangle (align 0/1/2)

レイアウトヘルパーと終端処理

measure(text) -> number                  // Rendered width of text in the current font (points)
remainingSpace() -> number               // Vertical space left on the page (points)
newPageSameSize()                        // Start a new page with the same dimensions
table(spec)                              // Draw a buffered table from a spec object
streamingTable(spec) -> WasmStreamingTable  // Open a streaming table for large datasets
done(builder)                            // Commit this page's queued ops to the document builder

table(spec) の spec オブジェクトは { columns: [{ header, width, align }], rows: [[...]], hasHeader } を使います。streamingTable(spec) の spec はさらに { repeatHeader, mode, sampleRows, minColWidthPt, maxColWidthPt, maxRowspan, batchSize } を追加します。

WasmStreamingTable

WasmFluentPageBuilder.streamingTable(spec) が返す行ストリーミングテーブルのハンドルです。行を逐次プッシュしてから finish() します。

columnCount() -> number       // Number of columns
pendingRowCount() -> number   // Rows in the current un-flushed batch
batchCount() -> number        // Number of completed batches
pushRow(cells)                // Push one row (array of cell strings)
pushRowSpan(cells)            // Push a row whose cells may carry rowspans
flush()                       // Flush the current batch
finish()                      // Finalize the table and replay it into the page

WasmEmbeddedFont

WasmDocumentBuilder.registerEmbeddedFont を通じて埋め込み用に登録されたフォントです。

WasmEmbeddedFont.fromBytes(data, name?) -> WasmEmbeddedFont  // Load a TTF/OTF font from bytes
font.name -> string                                          // The font's resolved name (getter)

ページテンプレート

ページ全体に適用される再利用可能なヘッダー／フッターの装飾です。

WasmArtifactStyle

new WasmArtifactStyle()        // Default style
font(name, size) -> this       // Set font family and size
bold() -> this                 // Make the text bold
color(r, g, b) -> this         // Set the text color (RGB 0.0–1.0)

WasmArtifact

new WasmArtifact()                       // Empty artifact
WasmArtifact.left(text) -> WasmArtifact   // Left-aligned artifact text
WasmArtifact.center(text) -> WasmArtifact // Center-aligned artifact text
WasmArtifact.right(text) -> WasmArtifact  // Right-aligned artifact text
withStyle(style) -> this                  // Apply a WasmArtifactStyle
withOffset(offset) -> this                // Set the vertical offset from the edge

WasmHeader / WasmFooter

new WasmHeader()                  // Empty header (WasmFooter is identical)
WasmHeader.left(text) -> WasmHeader     // Left-aligned header text
WasmHeader.center(text) -> WasmHeader   // Center-aligned header text
WasmHeader.right(text) -> WasmHeader    // Right-aligned header text

WasmPageTemplate

new WasmPageTemplate()         // Empty template
header(header) -> this         // Set the page header artifact
footer(footer) -> this         // Set the page footer artifact
skipFirstPage() -> this        // Omit header/footer on the first page

デジタル署名

signatures フィーチャーが必要です。

WasmCertificate

WasmCertificate.load(data) -> WasmCertificate                  // Load a DER certificate + key bundle
WasmCertificate.loadPem(certPem, keyPem) -> WasmCertificate    // Load from PEM cert + key strings
WasmCertificate.loadPkcs12(data, password) -> WasmCertificate  // Load from a PKCS#12 (.p12/.pfx) blob
cert.subject -> string         // Subject distinguished name (getter)
cert.issuer -> string          // Issuer distinguished name (getter)
cert.serial -> string          // Serial number (getter)
cert.validity -> bigint[]      // [notBefore, notAfter] as unix seconds (getter)
cert.isValid -> boolean        // Whether the certificate is currently valid (getter)

WasmSignature

WasmPdfDocument.signatures() が返します。

sig.signerName -> string | null          // Signer common name (getter)
sig.reason -> string | null              // Signing reason (getter)
sig.location -> string | null            // Signing location (getter)
sig.contactInfo -> string | null         // Signer contact info (getter)
sig.signingTime -> bigint | null         // Signing time as unix seconds (getter)
sig.coversWholeDocument -> boolean       // Whether the signature covers the entire file (getter)
sig.padesLevel -> PadesLevel             // PAdES baseline level of the signature (getter)
sig.verify() -> boolean                  // Verify the signature cryptographically
sig.verifyDetached(pdfData) -> boolean   // Verify including a messageDigest check against the bytes

WasmTimestamp

WasmTimestamp.parse(data) -> WasmTimestamp  // Parse a DER TimeStampToken / TSTInfo
ts.time -> bigint              // Timestamp time as unix seconds (getter)
ts.serial -> string            // Serial number (getter)
ts.policyOid -> string         // TSA policy OID (getter)
ts.tsaName -> string           // TSA name (getter)
ts.hashAlgorithm -> number     // Imprint hash algorithm id (getter)
ts.messageImprint -> Uint8Array  // The message imprint digest (getter)
ts.verify() -> boolean         // Verify the timestamp token

WasmRevocationMaterial

signPdfBytesPades 用のオフライン PAdES-B-LT 検証マテリアルです。

new WasmRevocationMaterial()   // Empty material set
addCert(der)                   // Add a DER X.509 certificate
addCrl(der)                    // Add a DER CRL
addOcsp(der)                   // Add a DER OCSP response

Dss

WasmPdfDocument.dss() が返す、解析済みの Document Security Store です。

dss.certCount -> number        // Number of DER certificates (getter)
getCert(i) -> Uint8Array | undefined   // i-th DER certificate
dss.crlCount -> number         // Number of DER CRLs (getter)
getCrl(i) -> Uint8Array | undefined    // i-th DER CRL
dss.ocspCount -> number        // Number of DER OCSP responses (getter)
getOcsp(i) -> Uint8Array | undefined   // i-th DER OCSP response
dss.vri -> string[]            // Per-signature VRI keys (uppercase-hex SHA-1 of /Contents) (getter)

OCR

OCR は、別の wasm-ocr ビルドにおいて純 Rust の tract バックエンドを通じて、すべて WASM 内で実行されます。モデルはホスト側で提供します。検出器／認識器の ONNX ファイルと辞書を取得し（modelManifest() を参照）、そのバイト列をコンストラクタに渡してください。

WasmOcrEngine

new WasmOcrEngine(detModel, recModel, dict, config?)  // Build from host-supplied model bytes
engine.ocrImage(imageBytes) -> string                 // OCR a raw image (PNG/JPEG/TIFF); returns JSON {text, confidence, spans}

パラメータ	型	説明
`detModel`	`Uint8Array`	DBNet 検出器の ONNX バイト列
`recModel`	`Uint8Array`	SVTR 認識器の ONNX バイト列
`dict`	`string`	認識器の文字辞書。1 行に 1 文字
`config`	`WasmOcrConfig \| undefined`	予約済み（チューニング済みのデフォルト値が使用されます）

WasmOcrConfig

new WasmOcrConfig()   // OCR configuration object (reserved for future tuning)

列挙型

Align

textInRect やテーブルの列指定で使われるテキスト／セルのアライメント判別子です。

Align.Left   // 0
Align.Center // 1
Align.Right  // 2

PadesLevel

PAdES のベースラインレベルです。signPdfBytesPades と WasmSignature.padesLevel で使われます。

PadesLevel.BB    // 0 — signed attrs incl. ESS signing-certificate-v2
PadesLevel.BT    // 1 — B-B + RFC 3161 signature-time-stamp
PadesLevel.BLt   // 2 — B-T + Document Security Store (DSS/VRI)
PadesLevel.BLta  // 3 — B-LT + document-scoped /DocTimeStamp

機能の対応状況

一部の機能は Rust のビルドフィーチャーの背後にゲートされています。デフォルトの pdf-oxide-wasm パッケージは一般的なセットを有効化しており、OCR は別の wasm-ocr ビルドで提供されます。

機能	WASM	備考
テキスト抽出	Yes	完全対応
構造化抽出	Yes	文字、スパン、単語、行、テーブル
PDF 作成	Yes	Markdown、HTML、テキスト、画像、DocumentBuilder
PDF 編集	Yes	メタデータ、回転、寸法、消去、ページ
フォームフィールド	Yes	読み取り、書き込み、エクスポート、フラット化、構築
検索	Yes	完全な正規表現対応
暗号化	Yes	AES-256 の読み取りと書き込み
注釈	Yes	読み取り、フラット化、墨消し、サニタイズ
PDF のマージ／分割	Yes	ページのマージとブックマークによる分割
埋め込みファイル	Yes	PDF へのファイル添付
ページラベル／XMP	Yes	ページラベルと XMP メタデータの読み取り
Office ラウンドトリップ	Yes	DOCX/PPTX/XLSX のインポートとエクスポート
検証	Yes	PDF/A、PDF/UA、PDF/X
バーコード	Yes（`barcodes`）	SVG またはページ画像としての 1D + QR
レンダリング	Yes（`rendering`）	ページ → PNG、画像へのフラット化
デジタル署名	Yes（`signatures`）	署名、PAdES B-LT、検証、タイムスタンプ
OCR	`wasm-ocr` ビルド	WASM 内 tract OCR。モデルはホスト側で取得

エラー処理

失敗する可能性のあるメソッドはすべて、JavaScript の Error オブジェクトをスローします:

try {
  const doc = new WasmPdfDocument(new Uint8Array([0, 1, 2]));
} catch (e) {
  console.error(`Failed to open: ${e.message}`);
}

TypeScript

完全な型定義がパッケージに含まれています:

import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";

const doc: WasmPdfDocument = new WasmPdfDocument(bytes);
const text: string = doc.extractText(0);
const pdf: WasmPdf = WasmPdf.fromMarkdown("# Hello");

他の言語のバインディング

PDF Oxide はあらゆる主要なエコシステム向けにネイティブバインディングを提供しています： Rust, Python, Node.js, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, Elixir。

次のステップ

型と列挙型 — すべての共有型と列挙型
Page API リファレンス — バインディング間で一貫したページ単位の反復処理
WASM 入門 — チュートリアル