JavaScript API Reference
PDF Oxide provides WebAssembly bindings for JavaScript and TypeScript. The npm package pdf-oxide-wasm works in Node.js, browsers, bundlers, Deno, and Cloudflare Workers.
npm install pdf-oxide-wasm
Multi-target packaging (v0.3.38)
pdf-oxide-wasm now ships three builds side by side with package.json conditional exports. Pick the subpath that matches your runtime — the auto-routed top-level import also resolves correctly through the exports field for most environments.
| Subpath | Target |
|---|---|
pdf-oxide-wasm/nodejs |
Node.js (CommonJS + ESM) |
pdf-oxide-wasm/bundler |
Vite, webpack, Rollup, esbuild, Bun |
pdf-oxide-wasm/web |
Browsers, Deno, Cloudflare Workers |
// Node.js
import { WasmPdfDocument } from "pdf-oxide-wasm/nodejs";
// Vite / webpack / Rollup
import init, { WasmPdfDocument } from "pdf-oxide-wasm/bundler";
await init();
// Browsers / Deno / Workers
import init, { WasmPdfDocument } from "pdf-oxide-wasm/web";
await init();
This fixes the ReferenceError: Can't find variable: __dirname thrown under browser bundlers prior to v0.3.38.
For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference. For type details, see Types & Enums.
WasmPdfDocument
The primary class for opening, extracting, editing, and saving PDFs.
import { WasmPdfDocument } from "pdf-oxide-wasm";
Constructor
new WasmPdfDocument(data)
Load a PDF document from raw bytes.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
The PDF file contents |
Throws: Error if the PDF is invalid or cannot be parsed.
const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);
Core Read-Only
pageCount() -> number
Get the number of pages in the document.
version() -> Uint8Array
Get the PDF version as [major, minor].
const [major, minor] = doc.version();
console.log(`PDF ${major}.${minor}`);
authenticate(password) -> boolean
Decrypt an encrypted PDF. Returns true if authentication succeeded.
| Parameter | Type | Description |
|---|---|---|
password |
string |
The password string |
hasStructureTree() -> boolean
Check if the document is a Tagged PDF with a structure tree.
Text Extraction
extractText(pageIndex) -> string
Extract plain text from a single page.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
const text = doc.extractText(0);
extractAllText() -> string
Extract plain text from all pages, separated by form feed characters.
extractChars(pageIndex) -> Array
Extract individual characters with precise positioning and font metadata.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
char |
string |
The character |
bbox |
{x, y, width, height} |
Bounding box |
fontName |
string |
Font name |
fontSize |
number |
Font size in points |
fontWeight |
string |
Weight (Normal, Bold, etc.) |
isItalic |
boolean |
Italic flag |
color |
{r, g, b} |
RGB color (0.0–1.0) |
const chars = doc.extractChars(0);
for (const c of chars) {
console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y})`);
}
extractPageText(pageIndex) -> object
Get spans, characters, and page dimensions from a single extraction pass. More efficient than calling extractSpans() + extractChars() separately.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
Returns: An object with fields:
| Field | Type | Description |
|---|---|---|
spans |
Array |
Array of span objects |
chars |
Array |
Array of character objects |
pageWidth |
number |
Page width in PDF points |
pageHeight |
number |
Page height in PDF points |
text |
string |
Full text content |
const result = doc.extractPageText(0);
console.log(`Page: ${result.pageWidth}x${result.pageHeight} pt`);
for (const span of result.spans) {
console.log(`'${span.text}' font=${span.fontName} size=${span.fontSize}`);
}
extractSpans(pageIndex, config?, readingOrder?) -> Array
Extract styled text spans with font metadata. Pass "column_aware" as readingOrder for multi-column PDFs.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
config |
object | undefined |
Optional span merging config |
readingOrder |
string | undefined |
Reading order: "column_aware" or undefined for default |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
text |
string |
The text content |
bbox |
{x, y, width, height} |
Bounding box |
fontName |
string |
Font name |
fontSize |
number |
Font size in points |
fontWeight |
string |
Weight (Normal, Bold, etc.) |
isItalic |
boolean |
Italic flag |
isMonospace |
boolean |
Whether the font is fixed-width |
charWidths |
number[] |
Per-glyph advance widths |
color |
{r, g, b} |
RGB color (0.0–1.0) |
const spans = doc.extractSpans(0);
for (const span of spans) {
console.log(`"${span.text}" size=${span.fontSize}`);
}
Format Conversion
toMarkdown(pageIndex, detectHeadings?, includeImages?, includeFormFields?) -> string
Convert a single page to Markdown.
| Parameter | Type | Default | Description |
|---|---|---|---|
pageIndex |
number |
– | Zero-based page number |
detectHeadings |
boolean |
true |
Detect headings from font size |
includeImages |
boolean |
true |
Include images |
includeFormFields |
boolean |
true |
Include form field values |
toMarkdownAll(detectHeadings?, includeImages?, includeFormFields?) -> string
Convert all pages to Markdown.
toHtml(pageIndex, preserveLayout?, detectHeadings?, includeFormFields?) -> string
Convert a single page to HTML.
| Parameter | Type | Default | Description |
|---|---|---|---|
pageIndex |
number |
– | Zero-based page number |
preserveLayout |
boolean |
false |
Preserve visual layout |
detectHeadings |
boolean |
true |
Detect headings |
includeFormFields |
boolean |
true |
Include form field values |
toHtmlAll(preserveLayout?, detectHeadings?, includeFormFields?) -> string
Convert all pages to HTML.
toPlainText(pageIndex) -> string
Convert a single page to plain text.
toPlainTextAll() -> string
Convert all pages to plain text.
Search
search(pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array
Search for text across all pages.
| Parameter | Type | Default | Description |
|---|---|---|---|
pattern |
string |
– | Search pattern (string or regex) |
caseInsensitive |
boolean |
false |
Case-insensitive search |
literal |
boolean |
false |
Treat pattern as literal string |
wholeWord |
boolean |
false |
Match whole words only |
maxResults |
number |
– | Maximum results to return |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
page |
number |
Page number |
text |
string |
Matched text |
bbox |
object |
Bounding box |
startIndex |
number |
Start index in page text |
endIndex |
number |
End index in page text |
searchPage(pageIndex, pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array
Search for text within a single page.
Image Info
extractImages(pageIndex) -> Array
Get image metadata for a page.
| Field | Type | Description |
|---|---|---|
width |
number |
Image width in pixels |
height |
number |
Image height in pixels |
colorSpace |
string |
Color space (e.g. DeviceRGB) |
bitsPerComponent |
number |
Bits per color channel |
bbox |
object |
Position on page |
extractImageBytes(pageIndex) -> Array
Extract raw image bytes from a page. Returns an array of objects:
| Field | Type | Description |
|---|---|---|
width |
number |
Image width in pixels |
height |
number |
Image height in pixels |
data |
Uint8Array |
Raw image bytes |
format |
string |
Image format |
pageImages(pageIndex) -> Array
Get image names and bounds for positioning operations.
| Field | Type | Description |
|---|---|---|
name |
string |
XObject name |
bounds |
number[] |
[x, y, width, height] |
matrix |
number[] |
Transform matrix [a, b, c, d, e, f] |
Document Structure
getOutline() -> Array | null
Get document bookmarks / table of contents. Returns null if no outline exists.
getAnnotations(pageIndex) -> Array
Get annotation metadata (type, rect, contents, etc.) for a page.
extractPaths(pageIndex) -> Array
Get vector paths (lines, curves, shapes) from a page.
pageLabels() -> Array
Get page label ranges. Returns an array of objects:
| Field | Type | Description |
|---|---|---|
startPage |
number |
First page in this range |
style |
string |
Numbering style |
prefix |
string |
Label prefix |
startValue |
number |
Starting number |
xmpMetadata() -> object | null
Get XMP metadata. Returns null if not present. Object fields include:
| Field | Type | Description |
|---|---|---|
dcTitle |
string | null |
Document title |
dcCreator |
string[] | null |
Creator list |
dcDescription |
string | null |
Description |
xmpCreatorTool |
string | null |
Creator tool |
xmpCreateDate |
string | null |
Creation date |
xmpModifyDate |
string | null |
Modification date |
pdfProducer |
string | null |
PDF producer |
Form Fields
getFormFields() -> Array
Get all form fields with name, type, value, and flags.
| Field | Type | Description |
|---|---|---|
name |
string |
Field name |
fieldType |
string |
Field type (text, checkbox, etc.) |
value |
string |
Current value |
flags |
number |
Field flags |
const fields = doc.getFormFields();
for (const f of fields) {
console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}
hasXfa() -> boolean
Check if the document contains XFA forms.
getFormFieldValue(name) -> any
Get a form field value by name. Returns a string, boolean, or null depending on the field type.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Field name |
setFormFieldValue(name, value) -> void
Set a form field value by name.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Field name |
value |
string | boolean |
New field value |
exportFormData(format?) -> Uint8Array
Export form data as FDF (default) or XFDF.
| Parameter | Type | Default | Description |
|---|---|---|---|
format |
string |
"fdf" |
Export format: "fdf" or "xfdf" |
Editing
Metadata
| Method | Parameters | Description |
|---|---|---|
setTitle(title) |
string |
Set document title |
setAuthor(author) |
string |
Set document author |
setSubject(subject) |
string |
Set document subject |
setKeywords(keywords) |
string |
Set document keywords |
Page Rotation
| Method | Parameters | Description |
|---|---|---|
pageRotation(pageIndex) |
number |
Get current rotation (0, 90, 180, 270) |
setPageRotation(pageIndex, degrees) |
number, number |
Set absolute rotation |
rotatePage(pageIndex, degrees) |
number, number |
Add to current rotation |
rotateAllPages(degrees) |
number |
Rotate all pages |
Page Dimensions
| Method | Parameters | Description |
|---|---|---|
pageMediaBox(pageIndex) |
number |
Get MediaBox [llx, lly, urx, ury] |
setPageMediaBox(pageIndex, llx, lly, urx, ury) |
number, ... |
Set MediaBox |
pageCropBox(pageIndex) |
number |
Get CropBox (may be null) |
setPageCropBox(pageIndex, llx, lly, urx, ury) |
number, ... |
Set CropBox |
cropMargins(left, right, top, bottom) |
number, ... |
Crop all page margins |
Erase / Whiteout
| Method | Parameters | Description |
|---|---|---|
eraseRegion(pageIndex, llx, lly, urx, ury) |
number, ... |
Erase a region |
eraseRegions(pageIndex, rects) |
number, Float32Array |
Erase multiple regions |
clearEraseRegions(pageIndex) |
number |
Clear pending erases |
Annotations & Redaction
| Method | Parameters | Description |
|---|---|---|
flattenPageAnnotations(pageIndex) |
number |
Flatten annotations on page |
flattenAllAnnotations() |
– | Flatten all annotations |
applyPageRedactions(pageIndex) |
number |
Apply redactions on page |
applyAllRedactions() |
– | Apply all redactions |
Form Flattening
| Method | Parameters | Description |
|---|---|---|
flattenForms() |
– | Flatten all form fields into page content |
flattenFormsOnPage(pageIndex) |
number |
Flatten forms on a specific page |
Merge & Embed
mergeFrom(data) -> number
Merge pages from another PDF. Returns the number of pages merged.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
The source PDF file bytes |
embedFile(name, data) -> void
Attach a file to the PDF.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Filename for the attachment |
data |
Uint8Array |
File contents |
Image Manipulation
| Method | Parameters | Description |
|---|---|---|
repositionImage(pageIndex, name, x, y) |
number, string, number, number |
Move image |
resizeImage(pageIndex, name, w, h) |
number, string, number, number |
Resize image |
setImageBounds(pageIndex, name, x, y, w, h) |
number, string, ... |
Set image bounds |
Rendering
| Method | Parameters | Returns | Description |
|---|---|---|---|
renderPage(pageIndex, dpi?) |
number, number |
Uint8Array |
Render a page to PNG bytes |
flattenToImages(dpi?) |
number |
Uint8Array |
Flatten all pages to image-based PDF |
Save
save() -> Uint8Array
Save the edited PDF as bytes. saveToBytes() is available as an alias.
saveEncryptedToBytes(password, ownerPassword?, allowPrint?, allowCopy?, allowModify?, allowAnnotate?) -> Uint8Array
Save with AES-256 encryption.
| Parameter | Type | Default | Description |
|---|---|---|---|
password |
string |
– | User password |
ownerPassword |
string |
user password | Owner password |
allowPrint |
boolean |
true |
Allow printing |
allowCopy |
boolean |
true |
Allow copying |
allowModify |
boolean |
false |
Allow modification |
allowAnnotate |
boolean |
true |
Allow annotations |
free()
Release WASM memory. Always call this when done with the document.
WasmPdf
Factory class for creating new PDFs.
import { WasmPdf } from "pdf-oxide-wasm";
Static Methods
WasmPdf.fromMarkdown(content, title?, author?) -> WasmPdf
Create a PDF from Markdown text.
| Parameter | Type | Default | Description |
|---|---|---|---|
content |
string |
– | Markdown content |
title |
string |
– | Document title |
author |
string |
– | Document author |
WasmPdf.fromHtml(content, title?, author?) -> WasmPdf
Create a PDF from HTML.
WasmPdf.fromText(content, title?, author?) -> WasmPdf
Create a PDF from plain text.
WasmPdf.fromImageBytes(data) -> WasmPdf
Create a single-page PDF from image bytes.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
Image file bytes (JPEG, PNG) |
WasmPdf.fromMultipleImageBytes(imagesArray) -> WasmPdf
Create a multi-page PDF from multiple images, one page per image.
| Parameter | Type | Description |
|---|---|---|
imagesArray |
Uint8Array[] |
Array of image file bytes |
Instance Methods
toBytes() -> Uint8Array
Get the PDF as bytes.
size -> number
PDF size in bytes (readonly property).
const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
console.log(`PDF size: ${pdf.size} bytes`);
writeFileSync("output.pdf", pdf.toBytes());
Feature Availability
Some features require native dependencies and are not available in WebAssembly:
| Feature | WASM | Notes |
|---|---|---|
| Text extraction | Yes | Full support |
| Structured extraction | Yes | Chars, spans |
| PDF creation | Yes | Markdown, HTML, text, images |
| PDF editing | Yes | Metadata, rotation, dimensions, erase |
| Form fields | Yes | Read, write, export, flatten |
| Search | Yes | Full regex support |
| Encryption | Yes | AES-256 read and write |
| Annotations | Yes | Read, flatten, redact |
| Merge PDFs | Yes | Merge pages from another PDF |
| Embedded files | Yes | Attach files to PDFs |
| Page labels | Yes | Read page label ranges |
| XMP metadata | Yes | Read XMP metadata |
| OCR | No | Requires native ONNX Runtime |
| Digital signatures | No | Requires native crypto libraries |
| Page rendering | No | Requires native tiny-skia |
| Barcodes | No | Requires native rendering |
| Office conversion | No | Requires native LibreOffice |
Error Handling
All methods that can fail throw JavaScript Error objects:
try {
const doc = new WasmPdfDocument(new Uint8Array([0, 1, 2]));
} catch (e) {
console.error(`Failed to open: ${e.message}`);
}
TypeScript
Full type definitions are included in the package:
import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";
const doc: WasmPdfDocument = new WasmPdfDocument(bytes);
const text: string = doc.extractText(0);
const pdf: WasmPdf = WasmPdf.fromMarkdown("# Hello");