What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

JavaScript API Reference

PDF Oxide provides WebAssembly bindings for JavaScript and TypeScript. The npm package pdf-oxide-wasm works in Node.js, browsers, bundlers, Deno, and Cloudflare Workers.

npm install pdf-oxide-wasm

Multi-target packaging (v0.3.38)

pdf-oxide-wasm now ships three builds side by side with package.json conditional exports. Pick the subpath that matches your runtime — the auto-routed top-level import also resolves correctly through the exports field for most environments.

Subpath	Target
`pdf-oxide-wasm/nodejs`	Node.js (CommonJS + ESM)
`pdf-oxide-wasm/bundler`	Vite, webpack, Rollup, esbuild, Bun
`pdf-oxide-wasm/web`	Browsers, Deno, Cloudflare Workers

// Node.js
import { WasmPdfDocument } from "pdf-oxide-wasm/nodejs";

// Vite / webpack / Rollup
import init, { WasmPdfDocument } from "pdf-oxide-wasm/bundler";
await init();

// Browsers / Deno / Workers
import init, { WasmPdfDocument } from "pdf-oxide-wasm/web";
await init();

This fixes the ReferenceError: Can't find variable: __dirname thrown under browser bundlers prior to v0.3.38.

For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference. For type details, see Types & Enums.

Some methods are gated behind Rust build features (rendering, signatures, barcodes, ocr-tract). The default pdf-oxide-wasm package enables the common set; OCR ships in the separate wasm-ocr build. See Feature Availability.

Module Functions

Free functions exported at the top level of the package.

import {
  setLogLevel, disableLogging,
  generateBarcodeSvg, generateQrSvg,
  planSplitByBookmarks, splitByBookmarks,
  setCryptoPolicy, cryptoPolicy, cryptoInventory, cryptoCbom,
  modelManifest, prefetchAvailable,
  signPdfBytes, signPdfBytesPades, hasDocumentTimestamp,
} from "pdf-oxide-wasm";

Logging

setLogLevel(level)   // Set log verbosity: "off" | "error" | "warn" | "info" | "debug" | "trace"
disableLogging()     // Silence all log output

Barcodes

generateBarcodeSvg(barcodeType, data) -> string  // 1D barcode as SVG; type 0–7 (Code128, Code39, Ean13, Ean8, UpcA, Itf, Code93, Codabar)
generateQrSvg(data, errorCorrection, size) -> string  // QR code as SVG; errorCorrection 0=Low 1=Medium 2=Quartile 3=High

Split by Bookmarks

planSplitByBookmarks(srcBytes, titlePrefix, ignoreCase, level, includeFrontMatter) -> Array  // Plan a split without producing PDFs; returns segment descriptors
splitByBookmarks(srcBytes, titlePrefix, ignoreCase, level, includeFrontMatter) -> Array       // Split at bookmark boundaries; returns [segment, bytes] pairs (level 0=all depths, 1=top-level)

Crypto Governance

setCryptoPolicy(spec)   // Install the process-wide crypto policy ("compat" | "strict" | "fips-strict"[;…]); fail-closed
cryptoPolicy() -> string  // The active crypto policy as its canonical grammar string
cryptoInventory() -> string[]  // Algorithm tokens exercised so far this process
cryptoCbom() -> string  // CycloneDX 1.6 Cryptographic Bill of Materials (JSON string)

OCR Model Provisioning

modelManifest() -> string   // JSON manifest of OCR detector/recognizer cache filenames and source URLs (host-side fetch)
prefetchAvailable() -> boolean  // Whether this build can download OCR models to a local cache (always false in WASM)

Signing (free functions)

signPdfBytes(pdfData, cert, reason?, location?) -> Uint8Array  // Sign raw PDF bytes with a WasmCertificate; returns the signed PDF
signPdfBytesPades(pdfData, cert, level, timestampToken?, revocation?, reason?, location?) -> Uint8Array  // Sign at a PAdES baseline level (BB/BT/BLt); pass a pre-fetched RFC 3161 token for BT/BLt
hasDocumentTimestamp(pdfData) -> boolean  // Whether the PDF carries a document-scoped /DocTimeStamp (PAdES-B-LTA)

WasmPdfDocument

The primary class for opening, extracting, editing, and saving PDFs.

import { WasmPdfDocument } from "pdf-oxide-wasm";

Constructor

`new WasmPdfDocument(data, password?)`

Load a PDF document from raw bytes.

Parameter	Type	Description
`data`	`Uint8Array`	The PDF file contents
`password`	`string \| undefined`	Optional password for encrypted PDFs

Throws: Error if the PDF is invalid or cannot be parsed.

const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);

Static Constructors

WasmPdfDocument.openFromDocxBytes(data) -> WasmPdfDocument  // Convert DOCX bytes to a PDF document
WasmPdfDocument.openFromPptxBytes(data) -> WasmPdfDocument  // Convert PPTX bytes to a PDF document
WasmPdfDocument.openFromXlsxBytes(data) -> WasmPdfDocument  // Convert XLSX bytes to a PDF document

Core Read-Only

`pageCount() -> number`

Get the number of pages in the document.

`version() -> Uint8Array`

Get the PDF version as [major, minor].

const [major, minor] = doc.version();
console.log(`PDF ${major}.${minor}`);

`authenticate(password) -> boolean`

Decrypt an encrypted PDF. Returns true if authentication succeeded.

Parameter	Type	Description
`password`	`string`	The password string

`hasStructureTree() -> boolean`

Check if the document is a Tagged PDF with a structure tree.

Signature inspection

signatureCount() -> number          // Number of digital signatures in the document
signatures() -> WasmSignature[]     // Parsed signatures (signer, reason, time, verify())
dss() -> Dss | null                 // Document Security Store (certs/CRLs/OCSP), or null

Text Extraction

`extractText(pageIndex, region?) -> string`

Extract plain text from a single page. Pass an optional [x, y, w, h] region to limit extraction.

Parameter	Type	Description
`pageIndex`	`number`	Zero-based page number
`region`	`number[] \| undefined`	Optional `[x, y, width, height]` clip

const text = doc.extractText(0);

`extractAllText() -> string`

Extract plain text from all pages, separated by form feed characters.

`extractStructured(pageIndex) -> string`

Extract a structured JSON representation of the page (blocks, lines, styling).

`extractChars(pageIndex, region?) -> Array`

Extract individual characters with precise positioning and font metadata.

Parameter	Type	Description
`pageIndex`	`number`	Zero-based page number
`region`	`number[] \| undefined`	Optional `[x, y, width, height]` clip

Returns: Array of objects with fields:

Field	Type	Description
`char`	`string`	The character
`bbox`	`{x, y, width, height}`	Bounding box
`fontName`	`string`	Font name
`fontSize`	`number`	Font size in points
`fontWeight`	`string`	Weight (Normal, Bold, etc.)
`isItalic`	`boolean`	Italic flag
`color`	`{r, g, b}`	RGB color (0.0–1.0)

const chars = doc.extractChars(0);
for (const c of chars) {
  console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y})`);
}

`extractPageText(pageIndex, readingOrder?) -> object`

Get spans, characters, and page dimensions from a single extraction pass. More efficient than calling extractSpans() + extractChars() separately. Pass "column_aware" for multi-column PDFs.

Parameter	Type	Description
`pageIndex`	`number`	Zero-based page number
`readingOrder`	`string \| undefined`	`"column_aware"` or `"top_to_bottom"` (default)

Returns: An object with fields:

Field	Type	Description
`spans`	`Array`	Array of span objects
`chars`	`Array`	Array of character objects
`pageWidth`	`number`	Page width in PDF points
`pageHeight`	`number`	Page height in PDF points
`text`	`string`	Full text content

const result = doc.extractPageText(0);
console.log(`Page: ${result.pageWidth}x${result.pageHeight} pt`);
for (const span of result.spans) {
  console.log(`'${span.text}' font=${span.fontName} size=${span.fontSize}`);
}

`extractSpans(pageIndex, region?, readingOrder?) -> Array`

Extract styled text spans with font metadata. Pass "column_aware" as readingOrder for multi-column PDFs.

Parameter	Type	Description
`pageIndex`	`number`	Zero-based page number
`region`	`number[] \| undefined`	Optional `[x, y, width, height]` clip
`readingOrder`	`string \| undefined`	`"column_aware"` or `"top_to_bottom"` (default)

Returns: Array of objects with fields:

Field	Type	Description
`text`	`string`	The text content
`bbox`	`{x, y, width, height}`	Bounding box
`fontName`	`string`	Font name
`fontSize`	`number`	Font size in points
`fontWeight`	`string`	Weight (Normal, Bold, etc.)
`isItalic`	`boolean`	Italic flag
`isMonospace`	`boolean`	Whether the font is fixed-width
`charWidths`	`number[]`	Per-glyph advance widths
`color`	`{r, g, b}`	RGB color (0.0–1.0)

const spans = doc.extractSpans(0);
for (const span of spans) {
  console.log(`"${span.text}" size=${span.fontSize}`);
}

Words, Lines, Tables

extractWords(pageIndex, region?) -> Array       // Word-level boxes with text + font metadata
extractTextLines(pageIndex, region?) -> Array   // Line-level boxes, each with its words
extractTables(pageIndex, region?) -> Array      // Detected tables with rows/cells (text + bboxes)

Header / Footer Artifacts

Detect and remove or erase running headers, footers, and page-furniture artifacts.

removeHeaders(threshold) -> number     // Remove detected headers across the document; returns count removed
removeFooters(threshold) -> number     // Remove detected footers; returns count removed
removeArtifacts(threshold) -> number   // Remove detected page artifacts; returns count removed
eraseHeader(pageIndex)                 // Queue an erase of the header region on a page
editHeader(pageIndex)                  // Mark the header region for editing on a page
eraseFooter(pageIndex)                 // Queue an erase of the footer region on a page
editFooter(pageIndex)                  // Mark the footer region for editing on a page
eraseArtifacts(pageIndex)              // Queue an erase of detected artifacts on a page

Region Extraction

`within(pageIndex, region) -> WasmPdfPageRegion`

Scope subsequent extraction to a rectangular region of a page. region is [x, y, width, height]. See WasmPdfPageRegion.

const region = doc.within(0, [50, 600, 400, 150]);
const text = region.extractText();

Format Conversion

`toMarkdown(pageIndex, detectHeadings?, includeImages?, includeFormFields?) -> string`

Convert a single page to Markdown.

Parameter	Type	Default	Description
`pageIndex`	`number`	–	Zero-based page number
`detectHeadings`	`boolean`	`true`	Detect headings from font size
`includeImages`	`boolean`	`true`	Include images
`includeFormFields`	`boolean`	`true`	Include form field values

`toMarkdownAll(detectHeadings?, includeImages?, includeFormFields?) -> string`

Convert all pages to Markdown.

`toHtml(pageIndex, preserveLayout?, detectHeadings?, includeFormFields?) -> string`

Convert a single page to HTML.

Parameter	Type	Default	Description
`pageIndex`	`number`	–	Zero-based page number
`preserveLayout`	`boolean`	`false`	Preserve visual layout
`detectHeadings`	`boolean`	`true`	Detect headings
`includeFormFields`	`boolean`	`true`	Include form field values

`toHtmlAll(preserveLayout?, detectHeadings?, includeFormFields?) -> string`

Convert all pages to HTML.

`toPlainText(pageIndex) -> string`

Convert a single page to plain text.

`toPlainTextAll() -> string`

Convert all pages to plain text.

Office round-trip

toDocxBytes() -> Uint8Array   // Export the document as a DOCX file
toPptxBytes() -> Uint8Array   // Export the document as a PPTX file
toXlsxBytes() -> Uint8Array   // Export the document as an XLSX file

Search

`search(pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array`

Search for text across all pages.

Parameter	Type	Default	Description
`pattern`	`string`	–	Search pattern (string or regex)
`caseInsensitive`	`boolean`	`false`	Case-insensitive search
`literal`	`boolean`	`false`	Treat pattern as literal string
`wholeWord`	`boolean`	`false`	Match whole words only
`maxResults`	`number`	`0`	Maximum results (0 = unlimited)

Returns: Array of objects with fields:

Field	Type	Description
`page`	`number`	Page number
`text`	`string`	Matched text
`bbox`	`object`	Bounding box
`startIndex`	`number`	Start index in page text
`endIndex`	`number`	End index in page text

`searchPage(pageIndex, pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array`

Search for text within a single page.

Image Info

`extractImages(pageIndex) -> Array`

Get image metadata for a page.

Field	Type	Description
`width`	`number`	Image width in pixels
`height`	`number`	Image height in pixels
`colorSpace`	`string`	Color space (e.g. `DeviceRGB`)
`bitsPerComponent`	`number`	Bits per color channel
`bbox`	`object`	Position on page

`extractImageBytes(pageIndex) -> Array`

Extract raw image bytes from a page. Returns an array of objects:

Field	Type	Description
`width`	`number`	Image width in pixels
`height`	`number`	Image height in pixels
`data`	`Uint8Array`	Raw image bytes
`format`	`string`	Image format

`pageImages(pageIndex) -> Array`

Get image names and bounds for positioning operations.

Field	Type	Description
`name`	`string`	XObject name
`bounds`	`number[]`	`[x, y, width, height]`
`matrix`	`number[]`	Transform matrix `[a, b, c, d, e, f]`

Vector Content

extractPaths(pageIndex, region?) -> Array   // Vector paths (lines, curves, shapes) on a page
extractRects(pageIndex, region?) -> Array   // Axis-aligned rectangles detected from path segments
extractLines(pageIndex, region?) -> Array   // Straight line segments detected from path data

Document Structure

`getOutline() -> Array | null`

Get document bookmarks / table of contents. Returns null if no outline exists.

`getAnnotations(pageIndex) -> Array`

Get annotation metadata (type, rect, contents, etc.) for a page.

`pageLabels() -> Array`

Get page label ranges. Returns an array of objects:

Field	Type	Description
`startPage`	`number`	First page in this range
`style`	`string`	Numbering style
`prefix`	`string`	Label prefix
`startValue`	`number`	Starting number

`xmpMetadata() -> object | null`

Get XMP metadata. Returns null if not present. Object fields include:

Field	Type	Description
`dcTitle`	`string \| null`	Document title
`dcCreator`	`string[] \| null`	Creator list
`dcDescription`	`string \| null`	Description
`xmpCreatorTool`	`string \| null`	Creator tool
`xmpCreateDate`	`string \| null`	Creation date
`xmpModifyDate`	`string \| null`	Modification date
`pdfProducer`	`string \| null`	PDF producer

Form Fields

`getFormFields() -> Array`

Get all form fields with name, type, value, and flags.

Field	Type	Description
`name`	`string`	Field name
`fieldType`	`string`	Field type (text, checkbox, etc.)
`value`	`string`	Current value
`flags`	`number`	Field flags

const fields = doc.getFormFields();
for (const f of fields) {
  console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}

`hasXfa() -> boolean`

Check if the document contains XFA forms.

`getFormFieldValue(name) -> any`

Get a form field value by name. Returns a string, boolean, or null depending on the field type.

`setFormFieldValue(name, value) -> void`

Set a form field value by name.

Parameter	Type	Description
`name`	`string`	Field name
`value`	`string \| boolean`	New field value

`exportFormData(format?) -> Uint8Array`

Export form data as FDF (default) or XFDF.

Parameter	Type	Default	Description
`format`	`string`	`"fdf"`	Export format: `"fdf"` or `"xfdf"`

Form flattening

flattenForms()                    // Flatten all form fields into page content
flattenFormsOnPage(pageIndex)     // Flatten forms on a specific page
flattenWarnings() -> string[]     // Warnings produced by the last flatten operation

Editing

Metadata

Method	Parameters	Description
`setTitle(title)`	`string`	Set document title
`setAuthor(author)`	`string`	Set document author
`setSubject(subject)`	`string`	Set document subject
`setKeywords(keywords)`	`string`	Set document keywords

Page Rotation

Method	Parameters	Description
`pageRotation(pageIndex)`	`number`	Get current rotation (0, 90, 180, 270)
`setPageRotation(pageIndex, degrees)`	`number, number`	Set absolute rotation
`rotatePage(pageIndex, degrees)`	`number, number`	Add to current rotation
`rotateAllPages(degrees)`	`number`	Rotate all pages

Page Dimensions

Method	Parameters	Description
`pageMediaBox(pageIndex)`	`number`	Get MediaBox `[llx, lly, urx, ury]`
`setPageMediaBox(pageIndex, llx, lly, urx, ury)`	`number, ...`	Set MediaBox
`pageCropBox(pageIndex)`	`number`	Get CropBox (may be null)
`setPageCropBox(pageIndex, llx, lly, urx, ury)`	`number, ...`	Set CropBox
`cropMargins(left, right, top, bottom)`	`number, ...`	Crop all page margins

Page Operations

deletePage(index)                 // Delete a page by index
movePage(fromIndex, toIndex)      // Move a page to a new position
extractPages(pages) -> Uint8Array // Build a new PDF from the given page indices

Erase / Whiteout

Method	Parameters	Description
`eraseRegion(pageIndex, llx, lly, urx, ury)`	`number, ...`	Erase a region
`eraseRegions(pageIndex, rects)`	`number, Float32Array`	Erase multiple regions
`clearEraseRegions(pageIndex)`	`number`	Clear pending erases

Annotations & Redaction

Method	Parameters	Description
`flattenPageAnnotations(pageIndex)`	`number`	Flatten annotations on page
`flattenAllAnnotations()`	–	Flatten all annotations
`applyPageRedactions(pageIndex)`	`number`	Apply redactions on page
`applyAllRedactions()`	–	Apply all redactions
`addRedaction(page, x0, y0, x1, y1, fill?)`	`number, ...`	Queue a redaction box (optional `[r,g,b]` fill)
`redactionCount(page)`	`number`	Count redactions queued for a page
`applyRedactionsDestructive(scrubMetadata?)`	`boolean`	Destructively remove content; returns a redaction report
`sanitizeDocument(scrubMetadata?, removeJavascript?, removeEmbeddedFiles?)`	`boolean, ...`	Strip metadata, scripts, embedded files; returns a report

Merge & Embed

`mergeFrom(data) -> number`

Merge pages from another PDF. Returns the number of pages merged.

Parameter	Type	Description
`data`	`Uint8Array`	The source PDF file bytes

`embedFile(name, data) -> void`

Attach a file to the PDF.

Parameter	Type	Description
`name`	`string`	Filename for the attachment
`data`	`Uint8Array`	File contents

Image Manipulation

Method	Parameters	Description
`repositionImage(pageIndex, name, x, y)`	`number, string, number, number`	Move image
`resizeImage(pageIndex, name, w, h)`	`number, string, number, number`	Resize image
`setImageBounds(pageIndex, name, x, y, w, h)`	`number, string, ...`	Set image bounds

Classification & Auto-Extraction

classifyDocument() -> string                 // Classify the whole document (e.g. born-digital vs scanned)
classifyPage(pageIndex) -> string            // Classify a single page
extractTextAuto(pageIndex) -> string         // Auto-pick native vs OCR extraction for a page
extractPageAuto(pageIndex, optionsJson?) -> string  // Auto-extraction returning a structured JSON page

Validation

validatePdfA(level) -> object        // Validate against a PDF/A conformance level (e.g. "2b")
convertToPdfA(level) -> object       // Convert toward a PDF/A level; returns a report
validatePdfUa(level?) -> object      // Validate against PDF/UA accessibility
validatePdfX(level?) -> object       // Validate against a PDF/X print level

Rendering

Requires the rendering feature.

Method	Parameters	Returns	Description
`renderPage(pageIndex, dpi?)`	`number, number`	`Uint8Array`	Render a page to PNG bytes (default 150 dpi)
`flattenToImages(dpi?)`	`number`	`Uint8Array`	Flatten all pages to an image-based PDF

OCR

Requires the wasm-ocr build. See WasmOcrEngine.

`extractTextOcr(pageIndex, engine) -> string`

Run the in-WASM OCR pipeline on a page using a host-built WasmOcrEngine. Returns recognized text in reading order.

const text = doc.extractTextOcr(0, engine);

Save

`save() -> Uint8Array`

Save the edited PDF as bytes. saveToBytes() is available as an alias.

`saveWithOptions(compress?, garbageCollect?, linearize?) -> Uint8Array`

Save with explicit serialization options.

Parameter	Type	Default	Description
`compress`	`boolean`	`true`	Compress object streams
`garbageCollect`	`boolean`	`true`	Drop unreferenced objects
`linearize`	`boolean`	`false`	Produce a linearized (“fast web view”) PDF

`saveEncryptedToBytes(password, ownerPassword?, allowPrint?, allowCopy?, allowModify?, allowAnnotate?) -> Uint8Array`

Save with AES-256 encryption.

Parameter	Type	Default	Description
`password`	`string`	–	User password
`ownerPassword`	`string`	user password	Owner password
`allowPrint`	`boolean`	`true`	Allow printing
`allowCopy`	`boolean`	`true`	Allow copying
`allowModify`	`boolean`	`true`	Allow modification
`allowAnnotate`	`boolean`	`true`	Allow annotations

`free()`

Release WASM memory. Always call this when done with the document.

WasmPdfPageRegion

A region handle returned by WasmPdfDocument.within(pageIndex, region). Extraction methods are scoped to the rectangle.

extractText() -> string       // Plain text within the region
extractChars() -> Array       // Characters within the region
extractWords() -> Array       // Words within the region
extractTextLines() -> Array   // Text lines within the region
extractTables() -> Array      // Tables within the region
extractImages() -> Array      // Images within the region
extractPaths() -> Array       // Vector paths within the region
extractRects() -> Array       // Rectangles within the region
extractLines() -> Array       // Line segments within the region
extractTextOcr(engine?) -> string  // OCR text within the region (wasm-ocr build)

WasmPdf

Factory class for creating new PDFs.

import { WasmPdf } from "pdf-oxide-wasm";

Static Methods

WasmPdf.fromMarkdown(content, title?, author?) -> WasmPdf  // Create a PDF from Markdown text
WasmPdf.fromHtml(content, title?, author?) -> WasmPdf      // Create a PDF from HTML
WasmPdf.fromText(content, title?, author?) -> WasmPdf      // Create a PDF from plain text
WasmPdf.fromBytes(data) -> WasmPdf                         // Open an existing PDF from bytes for modification
WasmPdf.fromImageBytes(data) -> WasmPdf                    // Single-page PDF from one image (JPEG/PNG)
WasmPdf.fromMultipleImageBytes(imagesArray) -> WasmPdf     // Multi-page PDF, one page per image
WasmPdf.merge(pdfs) -> WasmPdf                             // Merge an array of PDF byte buffers into one
WasmPdf.fromHtmlCss(html, css, fontBytes) -> WasmPdf       // HTML + CSS with a single embedded font
WasmPdf.fromHtmlCssWithFonts(html, css, fonts) -> WasmPdf  // HTML + CSS with multiple [name, bytes] fonts

Parameter	Type	Description
`content`	`string`	Source content (Markdown / HTML / text)
`title`	`string \| undefined`	Document title
`author`	`string \| undefined`	Document author
`data`	`Uint8Array`	PDF or image file bytes
`imagesArray`	`Uint8Array[]`	Array of image file bytes
`pdfs`	`Uint8Array[]`	Array of PDF file bytes to merge

Instance Methods

`toBytes() -> Uint8Array`

Get the PDF as bytes.

`size -> number`

PDF size in bytes (readonly getter).

const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
console.log(`PDF size: ${pdf.size} bytes`);
writeFileSync("output.pdf", pdf.toBytes());

WasmDocumentBuilder

Fluent, low-level page-layout builder for composing PDFs page by page. Pair with WasmFluentPageBuilder.

import { WasmDocumentBuilder } from "pdf-oxide-wasm";
const builder = new WasmDocumentBuilder();

Document setup

new WasmDocumentBuilder()          // Create an empty builder
title(title)                       // Set document title
author(author)                     // Set document author
subject(subject)                   // Set document subject
keywords(keywords)                 // Set document keywords
creator(creator)                   // Set the creator tool name
onOpen(script)                     // Set a document-level open JavaScript action
taggedPdfUa1()                     // Enable Tagged PDF / PDF/UA-1 output
language(lang)                     // Set the document language (e.g. "en-US")
roleMap(custom, standard)          // Map a custom structure tag to a standard role
registerEmbeddedFont(name, font)   // Register a WasmEmbeddedFont under a name

Page creation & output

a4Page() -> WasmFluentPageBuilder         // Start a new A4 page
letterPage() -> WasmFluentPageBuilder     // Start a new US Letter page
page(width, height) -> WasmFluentPageBuilder  // Start a custom-size page (points)
commitPage(page)                          // Commit a completed page builder
build() -> Uint8Array                     // Finish and return the PDF bytes
toBytesEncrypted(userPassword, ownerPassword?) -> Uint8Array  // Finish with AES-256 encryption

WasmFluentPageBuilder

Per-page builder returned by a4Page() / letterPage() / page(). Queue operations, then commit with done(builder) (or builder.commitPage(page)).

Text & flow

font(name, size)                 // Set the current font and size
at(x, y)                         // Move the cursor to an absolute position
text(text)                       // Draw text at the cursor
heading(level, text)             // Draw a heading (level 1–6)
paragraph(text)                  // Draw a wrapped paragraph
space(points)                    // Advance the cursor vertically
horizontalRule()                 // Draw a horizontal rule
newline()                        // Advance to the next line
columns(columnCount, gapPt, text)  // Lay text out in N balanced columns
footnote(refMark, noteText)      // Add a footnote marker + bottom-of-page note

Inline runs

inline(text)                     // Append an inline text run
inlineBold(text)                 // Append a bold inline run
inlineItalic(text)               // Append an italic inline run
inlineColor(r, g, b, text)       // Append a colored inline run (RGB 0.0–1.0)

Link & form actions

linkUrl(url)                     // Wrap the last element in a URL link
linkPage(page)                   // Link to another page index
linkNamed(destination)           // Link to a named destination
linkJavascript(script)           // Attach a JavaScript link action
onOpen(script)                   // Page open action
onClose(script)                  // Page close action
fieldKeystroke(script)           // Keystroke JavaScript for the last field
fieldFormat(script)              // Format JavaScript for the last field
fieldValidate(script)            // Validate JavaScript for the last field
fieldCalculate(script)           // Calculate JavaScript for the last field

Markup annotations

highlight(r, g, b)               // Highlight the last text run (RGB 0.0–1.0)
underline(r, g, b)               // Underline the last text run
strikeout(r, g, b)               // Strike out the last text run
squiggly(r, g, b)                // Squiggly-underline the last text run
stickyNote(text)                 // Add a sticky note at the cursor
stickyNoteAt(x, y, text)         // Add a sticky note at an absolute position
stamp(name)                      // Add a rubber-stamp annotation (e.g. "Approved")
freeText(x, y, w, h, text)       // Add a free-text annotation box
watermark(text)                  // Add a text watermark
watermarkConfidential()          // Add a "CONFIDENTIAL" watermark
watermarkDraft()                 // Add a "DRAFT" watermark

AcroForm widgets

textField(name, x, y, w, h, defaultValue?)            // Add a text field
checkbox(name, x, y, w, h, checked)                   // Add a checkbox
comboBox(name, x, y, w, h, options, selected?)        // Add a dropdown combo box
radioGroup(name, values, xs, ys, ws, hs, selected?)   // Add a radio-button group (parallel arrays)
pushButton(name, x, y, w, h, caption)                 // Add a clickable push button
signatureField(name, x, y, w, h)                      // Add an unsigned signature placeholder

Barcodes & images

barcode1d(barcodeType, data, x, y, w, h)   // Draw a 1D barcode (type 0–7)
barcodeQr(data, x, y, size)                // Draw a QR code
imageWithAlt(bytes, x, y, w, h, altText)   // Embed an image with accessibility alt text
imageArtifact(bytes, x, y, w, h)           // Embed a decorative image as an /Artifact

Graphics primitives

rect(x, y, w, h)                                  // Stroked 1pt rectangle outline
filledRect(x, y, w, h, r, g, b)                   // Filled rectangle (RGB 0.0–1.0)
line(x1, y1, x2, y2)                              // 1pt black line
strokeRect(x, y, w, h, width, r, g, b)            // Stroked rectangle, explicit width + color
strokeRectDashed(x, y, w, h, width, r, g, b, dash, phase)  // Dashed rectangle border
strokeLine(x1, y1, x2, y2, width, r, g, b)        // Line with explicit width + color
strokeLineDashed(x1, y1, x2, y2, width, r, g, b, dash, phase)  // Dashed line
textInRect(x, y, w, h, text, align)               // Lay text inside a rectangle (align 0/1/2)

Layout helpers & terminal

measure(text) -> number                  // Rendered width of text in the current font (points)
remainingSpace() -> number               // Vertical space left on the page (points)
newPageSameSize()                        // Start a new page with the same dimensions
table(spec)                              // Draw a buffered table from a spec object
streamingTable(spec) -> WasmStreamingTable  // Open a streaming table for large datasets
done(builder)                            // Commit this page's queued ops to the document builder

A table(spec) spec object uses { columns: [{ header, width, align }], rows: [[...]], hasHeader }. A streamingTable(spec) spec adds { repeatHeader, mode, sampleRows, minColWidthPt, maxColWidthPt, maxRowspan, batchSize }.

WasmStreamingTable

Row-streaming table handle returned by WasmFluentPageBuilder.streamingTable(spec). Push rows incrementally, then finish().

columnCount() -> number       // Number of columns
pendingRowCount() -> number   // Rows in the current un-flushed batch
batchCount() -> number        // Number of completed batches
pushRow(cells)                // Push one row (array of cell strings)
pushRowSpan(cells)            // Push a row whose cells may carry rowspans
flush()                       // Flush the current batch
finish()                      // Finalize the table and replay it into the page

WasmEmbeddedFont

A font registered for embedding via WasmDocumentBuilder.registerEmbeddedFont.

WasmEmbeddedFont.fromBytes(data, name?) -> WasmEmbeddedFont  // Load a TTF/OTF font from bytes
font.name -> string                                          // The font's resolved name (getter)

Page Templates

Reusable header/footer furniture applied across pages.

WasmArtifactStyle

new WasmArtifactStyle()        // Default style
font(name, size) -> this       // Set font family and size
bold() -> this                 // Make the text bold
color(r, g, b) -> this         // Set the text color (RGB 0.0–1.0)

WasmArtifact

new WasmArtifact()                       // Empty artifact
WasmArtifact.left(text) -> WasmArtifact   // Left-aligned artifact text
WasmArtifact.center(text) -> WasmArtifact // Center-aligned artifact text
WasmArtifact.right(text) -> WasmArtifact  // Right-aligned artifact text
withStyle(style) -> this                  // Apply a WasmArtifactStyle
withOffset(offset) -> this                // Set the vertical offset from the edge

WasmHeader / WasmFooter

new WasmHeader()                  // Empty header (WasmFooter is identical)
WasmHeader.left(text) -> WasmHeader     // Left-aligned header text
WasmHeader.center(text) -> WasmHeader   // Center-aligned header text
WasmHeader.right(text) -> WasmHeader    // Right-aligned header text

WasmPageTemplate

new WasmPageTemplate()         // Empty template
header(header) -> this         // Set the page header artifact
footer(footer) -> this         // Set the page footer artifact
skipFirstPage() -> this        // Omit header/footer on the first page

Digital Signatures

Requires the signatures feature.

WasmCertificate

WasmCertificate.load(data) -> WasmCertificate                  // Load a DER certificate + key bundle
WasmCertificate.loadPem(certPem, keyPem) -> WasmCertificate    // Load from PEM cert + key strings
WasmCertificate.loadPkcs12(data, password) -> WasmCertificate  // Load from a PKCS#12 (.p12/.pfx) blob
cert.subject -> string         // Subject distinguished name (getter)
cert.issuer -> string          // Issuer distinguished name (getter)
cert.serial -> string          // Serial number (getter)
cert.validity -> bigint[]      // [notBefore, notAfter] as unix seconds (getter)
cert.isValid -> boolean        // Whether the certificate is currently valid (getter)

WasmSignature

Returned by WasmPdfDocument.signatures().

sig.signerName -> string | null          // Signer common name (getter)
sig.reason -> string | null              // Signing reason (getter)
sig.location -> string | null            // Signing location (getter)
sig.contactInfo -> string | null         // Signer contact info (getter)
sig.signingTime -> bigint | null         // Signing time as unix seconds (getter)
sig.coversWholeDocument -> boolean       // Whether the signature covers the entire file (getter)
sig.padesLevel -> PadesLevel             // PAdES baseline level of the signature (getter)
sig.verify() -> boolean                  // Verify the signature cryptographically
sig.verifyDetached(pdfData) -> boolean   // Verify including a messageDigest check against the bytes

WasmTimestamp

WasmTimestamp.parse(data) -> WasmTimestamp  // Parse a DER TimeStampToken / TSTInfo
ts.time -> bigint              // Timestamp time as unix seconds (getter)
ts.serial -> string            // Serial number (getter)
ts.policyOid -> string         // TSA policy OID (getter)
ts.tsaName -> string           // TSA name (getter)
ts.hashAlgorithm -> number     // Imprint hash algorithm id (getter)
ts.messageImprint -> Uint8Array  // The message imprint digest (getter)
ts.verify() -> boolean         // Verify the timestamp token

WasmRevocationMaterial

Offline PAdES-B-LT validation material for signPdfBytesPades.

new WasmRevocationMaterial()   // Empty material set
addCert(der)                   // Add a DER X.509 certificate
addCrl(der)                    // Add a DER CRL
addOcsp(der)                   // Add a DER OCSP response

Dss

A parsed Document Security Store returned by WasmPdfDocument.dss().

dss.certCount -> number        // Number of DER certificates (getter)
getCert(i) -> Uint8Array | undefined   // i-th DER certificate
dss.crlCount -> number         // Number of DER CRLs (getter)
getCrl(i) -> Uint8Array | undefined    // i-th DER CRL
dss.ocspCount -> number        // Number of DER OCSP responses (getter)
getOcsp(i) -> Uint8Array | undefined   // i-th DER OCSP response
dss.vri -> string[]            // Per-signature VRI keys (uppercase-hex SHA-1 of /Contents) (getter)

OCR

OCR runs entirely in-WASM via the pure-Rust tract backend in the separate wasm-ocr build. Models are delivered host-side — fetch the detector/recognizer ONNX files and dictionary (see modelManifest()), then hand the bytes to the constructor.

WasmOcrEngine

new WasmOcrEngine(detModel, recModel, dict, config?)  // Build from host-supplied model bytes
engine.ocrImage(imageBytes) -> string                 // OCR a raw image (PNG/JPEG/TIFF); returns JSON {text, confidence, spans}

Parameter	Type	Description
`detModel`	`Uint8Array`	DBNet detector ONNX bytes
`recModel`	`Uint8Array`	SVTR recognizer ONNX bytes
`dict`	`string`	Recognizer character dictionary, one char per line
`config`	`WasmOcrConfig \| undefined`	Reserved (tuned defaults are used)

WasmOcrConfig

new WasmOcrConfig()   // OCR configuration object (reserved for future tuning)

Enums

Align

Text/cell alignment discriminant used by textInRect and table column specs.

Align.Left   // 0
Align.Center // 1
Align.Right  // 2

PadesLevel

PAdES baseline level, used by signPdfBytesPades and WasmSignature.padesLevel.

PadesLevel.BB    // 0 — signed attrs incl. ESS signing-certificate-v2
PadesLevel.BT    // 1 — B-B + RFC 3161 signature-time-stamp
PadesLevel.BLt   // 2 — B-T + Document Security Store (DSS/VRI)
PadesLevel.BLta  // 3 — B-LT + document-scoped /DocTimeStamp

Feature Availability

Some features are gated behind Rust build features. The default pdf-oxide-wasm package enables the common set; OCR ships in the separate wasm-ocr build.

Feature	WASM	Notes
Text extraction	Yes	Full support
Structured extraction	Yes	Chars, spans, words, lines, tables
PDF creation	Yes	Markdown, HTML, text, images, DocumentBuilder
PDF editing	Yes	Metadata, rotation, dimensions, erase, pages
Form fields	Yes	Read, write, export, flatten, build
Search	Yes	Full regex support
Encryption	Yes	AES-256 read and write
Annotations	Yes	Read, flatten, redact, sanitize
Merge / split PDFs	Yes	Merge pages and split by bookmarks
Embedded files	Yes	Attach files to PDFs
Page labels / XMP	Yes	Read page labels and XMP metadata
Office round-trip	Yes	DOCX/PPTX/XLSX import and export
Validation	Yes	PDF/A, PDF/UA, PDF/X
Barcodes	Yes (`barcodes`)	1D + QR as SVG or page images
Rendering	Yes (`rendering`)	Page → PNG, flatten to images
Digital signatures	Yes (`signatures`)	Sign, PAdES B-LT, verify, timestamps
OCR	`wasm-ocr` build	In-WASM tract OCR; models fetched host-side

Error Handling

All methods that can fail throw JavaScript Error objects:

try {
  const doc = new WasmPdfDocument(new Uint8Array([0, 1, 2]));
} catch (e) {
  console.error(`Failed to open: ${e.message}`);
}

TypeScript

Full type definitions are included in the package:

import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";

const doc: WasmPdfDocument = new WasmPdfDocument(bytes);
const text: string = doc.extractText(0);
const pdf: WasmPdf = WasmPdf.fromMarkdown("# Hello");

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.

Next Steps

Types & Enums — all shared types and enums
Page API Reference — consistent per-page iteration across bindings
Getting Started with WASM — tutorial