Skip to content

JavaScript API Reference

PDF Oxide provides WebAssembly bindings for JavaScript and TypeScript. The npm package pdf-oxide-wasm works in both Node.js and browsers.

npm install pdf-oxide-wasm

For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference. For type details, see Types & Enums.


WasmPdfDocument

The primary class for opening, extracting, editing, and saving PDFs.

import { WasmPdfDocument } from "pdf-oxide-wasm";

Constructor

new WasmPdfDocument(data)

Load a PDF document from raw bytes.

Parameter Type Description
data Uint8Array The PDF file contents

Throws: Error if the PDF is invalid or cannot be parsed.

const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);

Core Read-Only

pageCount() -> number

Get the number of pages in the document.

version() -> Uint8Array

Get the PDF version as [major, minor].

const [major, minor] = doc.version();
console.log(`PDF ${major}.${minor}`);

authenticate(password) -> boolean

Decrypt an encrypted PDF. Returns true if authentication succeeded.

Parameter Type Description
password string The password string

hasStructureTree() -> boolean

Check if the document is a Tagged PDF with a structure tree.


Text Extraction

extractText(pageIndex) -> string

Extract plain text from a single page.

Parameter Type Description
pageIndex number Zero-based page number
const text = doc.extractText(0);

extractAllText() -> string

Extract plain text from all pages, separated by form feed characters.

extractChars(pageIndex) -> Array

Extract individual characters with precise positioning and font metadata.

Parameter Type Description
pageIndex number Zero-based page number

Returns: Array of objects with fields:

Field Type Description
char string The character
bbox {x, y, width, height} Bounding box
fontName string Font name
fontSize number Font size in points
fontWeight string Weight (Normal, Bold, etc.)
isItalic boolean Italic flag
color {r, g, b} RGB color (0.0–1.0)
const chars = doc.extractChars(0);
for (const c of chars) {
  console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y})`);
}

extractPageText(pageIndex) -> object

Get spans, characters, and page dimensions from a single extraction pass. More efficient than calling extractSpans() + extractChars() separately.

Parameter Type Description
pageIndex number Zero-based page number

Returns: An object with fields:

Field Type Description
spans Array Array of span objects
chars Array Array of character objects
pageWidth number Page width in PDF points
pageHeight number Page height in PDF points
text string Full text content
const result = doc.extractPageText(0);
console.log(`Page: ${result.pageWidth}x${result.pageHeight} pt`);
for (const span of result.spans) {
  console.log(`'${span.text}' font=${span.fontName} size=${span.fontSize}`);
}

extractSpans(pageIndex, config?, readingOrder?) -> Array

Extract styled text spans with font metadata. Pass "column_aware" as readingOrder for multi-column PDFs.

Parameter Type Description
pageIndex number Zero-based page number
config object | undefined Optional span merging config
readingOrder string | undefined Reading order: "column_aware" or undefined for default

Returns: Array of objects with fields:

Field Type Description
text string The text content
bbox {x, y, width, height} Bounding box
fontName string Font name
fontSize number Font size in points
fontWeight string Weight (Normal, Bold, etc.)
isItalic boolean Italic flag
isMonospace boolean Whether the font is fixed-width
charWidths number[] Per-glyph advance widths
color {r, g, b} RGB color (0.0–1.0)
const spans = doc.extractSpans(0);
for (const span of spans) {
  console.log(`"${span.text}" size=${span.fontSize}`);
}

Format Conversion

toMarkdown(pageIndex, detectHeadings?, includeImages?, includeFormFields?) -> string

Convert a single page to Markdown.

Parameter Type Default Description
pageIndex number Zero-based page number
detectHeadings boolean true Detect headings from font size
includeImages boolean true Include images
includeFormFields boolean true Include form field values

toMarkdownAll(detectHeadings?, includeImages?, includeFormFields?) -> string

Convert all pages to Markdown.

toHtml(pageIndex, preserveLayout?, detectHeadings?, includeFormFields?) -> string

Convert a single page to HTML.

Parameter Type Default Description
pageIndex number Zero-based page number
preserveLayout boolean false Preserve visual layout
detectHeadings boolean true Detect headings
includeFormFields boolean true Include form field values

toHtmlAll(preserveLayout?, detectHeadings?, includeFormFields?) -> string

Convert all pages to HTML.

toPlainText(pageIndex) -> string

Convert a single page to plain text.

toPlainTextAll() -> string

Convert all pages to plain text.


search(pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array

Search for text across all pages.

Parameter Type Default Description
pattern string Search pattern (string or regex)
caseInsensitive boolean false Case-insensitive search
literal boolean false Treat pattern as literal string
wholeWord boolean false Match whole words only
maxResults number Maximum results to return

Returns: Array of objects with fields:

Field Type Description
page number Page number
text string Matched text
bbox object Bounding box
startIndex number Start index in page text
endIndex number End index in page text

searchPage(pageIndex, pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array

Search for text within a single page.


Image Info

extractImages(pageIndex) -> Array

Get image metadata for a page.

Field Type Description
width number Image width in pixels
height number Image height in pixels
colorSpace string Color space (e.g. DeviceRGB)
bitsPerComponent number Bits per color channel
bbox object Position on page

extractImageBytes(pageIndex) -> Array

Extract raw image bytes from a page. Returns an array of objects:

Field Type Description
width number Image width in pixels
height number Image height in pixels
data Uint8Array Raw image bytes
format string Image format

pageImages(pageIndex) -> Array

Get image names and bounds for positioning operations.

Field Type Description
name string XObject name
bounds number[] [x, y, width, height]
matrix number[] Transform matrix [a, b, c, d, e, f]

Document Structure

getOutline() -> Array | null

Get document bookmarks / table of contents. Returns null if no outline exists.

getAnnotations(pageIndex) -> Array

Get annotation metadata (type, rect, contents, etc.) for a page.

extractPaths(pageIndex) -> Array

Get vector paths (lines, curves, shapes) from a page.

pageLabels() -> Array

Get page label ranges. Returns an array of objects:

Field Type Description
startPage number First page in this range
style string Numbering style
prefix string Label prefix
startValue number Starting number

xmpMetadata() -> object | null

Get XMP metadata. Returns null if not present. Object fields include:

Field Type Description
dcTitle string | null Document title
dcCreator string[] | null Creator list
dcDescription string | null Description
xmpCreatorTool string | null Creator tool
xmpCreateDate string | null Creation date
xmpModifyDate string | null Modification date
pdfProducer string | null PDF producer

Form Fields

getFormFields() -> Array

Get all form fields with name, type, value, and flags.

Field Type Description
name string Field name
fieldType string Field type (text, checkbox, etc.)
value string Current value
flags number Field flags
const fields = doc.getFormFields();
for (const f of fields) {
  console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}

hasXfa() -> boolean

Check if the document contains XFA forms.

getFormFieldValue(name) -> any

Get a form field value by name. Returns a string, boolean, or null depending on the field type.

Parameter Type Description
name string Field name

setFormFieldValue(name, value) -> void

Set a form field value by name.

Parameter Type Description
name string Field name
value string | boolean New field value

exportFormData(format?) -> Uint8Array

Export form data as FDF (default) or XFDF.

Parameter Type Default Description
format string "fdf" Export format: "fdf" or "xfdf"

Editing

Metadata

Method Parameters Description
setTitle(title) string Set document title
setAuthor(author) string Set document author
setSubject(subject) string Set document subject
setKeywords(keywords) string Set document keywords

Page Rotation

Method Parameters Description
pageRotation(pageIndex) number Get current rotation (0, 90, 180, 270)
setPageRotation(pageIndex, degrees) number, number Set absolute rotation
rotatePage(pageIndex, degrees) number, number Add to current rotation
rotateAllPages(degrees) number Rotate all pages

Page Dimensions

Method Parameters Description
pageMediaBox(pageIndex) number Get MediaBox [llx, lly, urx, ury]
setPageMediaBox(pageIndex, llx, lly, urx, ury) number, ... Set MediaBox
pageCropBox(pageIndex) number Get CropBox (may be null)
setPageCropBox(pageIndex, llx, lly, urx, ury) number, ... Set CropBox
cropMargins(left, right, top, bottom) number, ... Crop all page margins

Erase / Whiteout

Method Parameters Description
eraseRegion(pageIndex, llx, lly, urx, ury) number, ... Erase a region
eraseRegions(pageIndex, rects) number, Float32Array Erase multiple regions
clearEraseRegions(pageIndex) number Clear pending erases

Annotations & Redaction

Method Parameters Description
flattenPageAnnotations(pageIndex) number Flatten annotations on page
flattenAllAnnotations() Flatten all annotations
applyPageRedactions(pageIndex) number Apply redactions on page
applyAllRedactions() Apply all redactions

Form Flattening

Method Parameters Description
flattenForms() Flatten all form fields into page content
flattenFormsOnPage(pageIndex) number Flatten forms on a specific page

Merge & Embed

mergeFrom(data) -> number

Merge pages from another PDF. Returns the number of pages merged.

Parameter Type Description
data Uint8Array The source PDF file bytes

embedFile(name, data) -> void

Attach a file to the PDF.

Parameter Type Description
name string Filename for the attachment
data Uint8Array File contents

Image Manipulation

Method Parameters Description
repositionImage(pageIndex, name, x, y) number, string, number, number Move image
resizeImage(pageIndex, name, w, h) number, string, number, number Resize image
setImageBounds(pageIndex, name, x, y, w, h) number, string, ... Set image bounds

Rendering

Method Parameters Returns Description
renderPage(pageIndex, dpi?) number, number Uint8Array Render a page to PNG bytes
flattenToImages(dpi?) number Uint8Array Flatten all pages to image-based PDF

Save

save() -> Uint8Array

Save the edited PDF as bytes. saveToBytes() is available as an alias.

saveEncryptedToBytes(password, ownerPassword?, allowPrint?, allowCopy?, allowModify?, allowAnnotate?) -> Uint8Array

Save with AES-256 encryption.

Parameter Type Default Description
password string User password
ownerPassword string user password Owner password
allowPrint boolean true Allow printing
allowCopy boolean true Allow copying
allowModify boolean false Allow modification
allowAnnotate boolean true Allow annotations

free()

Release WASM memory. Always call this when done with the document.


WasmPdf

Factory class for creating new PDFs.

import { WasmPdf } from "pdf-oxide-wasm";

Static Methods

WasmPdf.fromMarkdown(content, title?, author?) -> WasmPdf

Create a PDF from Markdown text.

Parameter Type Default Description
content string Markdown content
title string Document title
author string Document author

WasmPdf.fromHtml(content, title?, author?) -> WasmPdf

Create a PDF from HTML.

WasmPdf.fromText(content, title?, author?) -> WasmPdf

Create a PDF from plain text.

WasmPdf.fromImageBytes(data) -> WasmPdf

Create a single-page PDF from image bytes.

Parameter Type Description
data Uint8Array Image file bytes (JPEG, PNG)

WasmPdf.fromMultipleImageBytes(imagesArray) -> WasmPdf

Create a multi-page PDF from multiple images, one page per image.

Parameter Type Description
imagesArray Uint8Array[] Array of image file bytes

Instance Methods

toBytes() -> Uint8Array

Get the PDF as bytes.

size -> number

PDF size in bytes (readonly property).

const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
console.log(`PDF size: ${pdf.size} bytes`);
writeFileSync("output.pdf", pdf.toBytes());

Feature Availability

Some features require native dependencies and are not available in WebAssembly:

Feature WASM Notes
Text extraction Yes Full support
Structured extraction Yes Chars, spans
PDF creation Yes Markdown, HTML, text, images
PDF editing Yes Metadata, rotation, dimensions, erase
Form fields Yes Read, write, export, flatten
Search Yes Full regex support
Encryption Yes AES-256 read and write
Annotations Yes Read, flatten, redact
Merge PDFs Yes Merge pages from another PDF
Embedded files Yes Attach files to PDFs
Page labels Yes Read page label ranges
XMP metadata Yes Read XMP metadata
OCR No Requires native ONNX Runtime
Digital signatures No Requires native crypto libraries
Page rendering No Requires native tiny-skia
Barcodes No Requires native rendering
Office conversion No Requires native LibreOffice

Error Handling

All methods that can fail throw JavaScript Error objects:

try {
  const doc = new WasmPdfDocument(new Uint8Array([0, 1, 2]));
} catch (e) {
  console.error(`Failed to open: ${e.message}`);
}

TypeScript

Full type definitions are included in the package:

import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";

const doc: WasmPdfDocument = new WasmPdfDocument(bytes);
const text: string = doc.extractText(0);
const pdf: WasmPdf = WasmPdf.fromMarkdown("# Hello");