JavaScript API Reference
PDF Oxide provides WebAssembly bindings for JavaScript and TypeScript. The npm package pdf-oxide-wasm works in both Node.js and browsers.
npm install pdf-oxide-wasm
For the Rust API, see the Rust API Reference. For the Python API, see the Python API Reference. For type details, see Types & Enums.
WasmPdfDocument
The primary class for opening, extracting, editing, and saving PDFs.
import { WasmPdfDocument } from "pdf-oxide-wasm";
Constructor
new WasmPdfDocument(data)
Load a PDF document from raw bytes.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
The PDF file contents |
Throws: Error if the PDF is invalid or cannot be parsed.
const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);
Core Read-Only
pageCount() -> number
Get the number of pages in the document.
version() -> Uint8Array
Get the PDF version as [major, minor].
const [major, minor] = doc.version();
console.log(`PDF ${major}.${minor}`);
authenticate(password) -> boolean
Decrypt an encrypted PDF. Returns true if authentication succeeded.
| Parameter | Type | Description |
|---|---|---|
password |
string |
The password string |
hasStructureTree() -> boolean
Check if the document is a Tagged PDF with a structure tree.
Text Extraction
extractText(pageIndex) -> string
Extract plain text from a single page.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
const text = doc.extractText(0);
extractAllText() -> string
Extract plain text from all pages, separated by form feed characters.
extractChars(pageIndex) -> Array
Extract individual characters with precise positioning and font metadata.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
char |
string |
The character |
bbox |
{x, y, width, height} |
Bounding box |
fontName |
string |
Font name |
fontSize |
number |
Font size in points |
fontWeight |
string |
Weight (Normal, Bold, etc.) |
isItalic |
boolean |
Italic flag |
color |
{r, g, b} |
RGB color (0.0–1.0) |
const chars = doc.extractChars(0);
for (const c of chars) {
console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y})`);
}
extractPageText(pageIndex) -> object
Get spans, characters, and page dimensions from a single extraction pass. More efficient than calling extractSpans() + extractChars() separately.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
Returns: An object with fields:
| Field | Type | Description |
|---|---|---|
spans |
Array |
Array of span objects |
chars |
Array |
Array of character objects |
pageWidth |
number |
Page width in PDF points |
pageHeight |
number |
Page height in PDF points |
text |
string |
Full text content |
const result = doc.extractPageText(0);
console.log(`Page: ${result.pageWidth}x${result.pageHeight} pt`);
for (const span of result.spans) {
console.log(`'${span.text}' font=${span.fontName} size=${span.fontSize}`);
}
extractSpans(pageIndex, config?, readingOrder?) -> Array
Extract styled text spans with font metadata. Pass "column_aware" as readingOrder for multi-column PDFs.
| Parameter | Type | Description |
|---|---|---|
pageIndex |
number |
Zero-based page number |
config |
object | undefined |
Optional span merging config |
readingOrder |
string | undefined |
Reading order: "column_aware" or undefined for default |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
text |
string |
The text content |
bbox |
{x, y, width, height} |
Bounding box |
fontName |
string |
Font name |
fontSize |
number |
Font size in points |
fontWeight |
string |
Weight (Normal, Bold, etc.) |
isItalic |
boolean |
Italic flag |
isMonospace |
boolean |
Whether the font is fixed-width |
charWidths |
number[] |
Per-glyph advance widths |
color |
{r, g, b} |
RGB color (0.0–1.0) |
const spans = doc.extractSpans(0);
for (const span of spans) {
console.log(`"${span.text}" size=${span.fontSize}`);
}
Format Conversion
toMarkdown(pageIndex, detectHeadings?, includeImages?, includeFormFields?) -> string
Convert a single page to Markdown.
| Parameter | Type | Default | Description |
|---|---|---|---|
pageIndex |
number |
– | Zero-based page number |
detectHeadings |
boolean |
true |
Detect headings from font size |
includeImages |
boolean |
true |
Include images |
includeFormFields |
boolean |
true |
Include form field values |
toMarkdownAll(detectHeadings?, includeImages?, includeFormFields?) -> string
Convert all pages to Markdown.
toHtml(pageIndex, preserveLayout?, detectHeadings?, includeFormFields?) -> string
Convert a single page to HTML.
| Parameter | Type | Default | Description |
|---|---|---|---|
pageIndex |
number |
– | Zero-based page number |
preserveLayout |
boolean |
false |
Preserve visual layout |
detectHeadings |
boolean |
true |
Detect headings |
includeFormFields |
boolean |
true |
Include form field values |
toHtmlAll(preserveLayout?, detectHeadings?, includeFormFields?) -> string
Convert all pages to HTML.
toPlainText(pageIndex) -> string
Convert a single page to plain text.
toPlainTextAll() -> string
Convert all pages to plain text.
Search
search(pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array
Search for text across all pages.
| Parameter | Type | Default | Description |
|---|---|---|---|
pattern |
string |
– | Search pattern (string or regex) |
caseInsensitive |
boolean |
false |
Case-insensitive search |
literal |
boolean |
false |
Treat pattern as literal string |
wholeWord |
boolean |
false |
Match whole words only |
maxResults |
number |
– | Maximum results to return |
Returns: Array of objects with fields:
| Field | Type | Description |
|---|---|---|
page |
number |
Page number |
text |
string |
Matched text |
bbox |
object |
Bounding box |
startIndex |
number |
Start index in page text |
endIndex |
number |
End index in page text |
searchPage(pageIndex, pattern, caseInsensitive?, literal?, wholeWord?, maxResults?) -> Array
Search for text within a single page.
Image Info
extractImages(pageIndex) -> Array
Get image metadata for a page.
| Field | Type | Description |
|---|---|---|
width |
number |
Image width in pixels |
height |
number |
Image height in pixels |
colorSpace |
string |
Color space (e.g. DeviceRGB) |
bitsPerComponent |
number |
Bits per color channel |
bbox |
object |
Position on page |
extractImageBytes(pageIndex) -> Array
Extract raw image bytes from a page. Returns an array of objects:
| Field | Type | Description |
|---|---|---|
width |
number |
Image width in pixels |
height |
number |
Image height in pixels |
data |
Uint8Array |
Raw image bytes |
format |
string |
Image format |
pageImages(pageIndex) -> Array
Get image names and bounds for positioning operations.
| Field | Type | Description |
|---|---|---|
name |
string |
XObject name |
bounds |
number[] |
[x, y, width, height] |
matrix |
number[] |
Transform matrix [a, b, c, d, e, f] |
Document Structure
getOutline() -> Array | null
Get document bookmarks / table of contents. Returns null if no outline exists.
getAnnotations(pageIndex) -> Array
Get annotation metadata (type, rect, contents, etc.) for a page.
extractPaths(pageIndex) -> Array
Get vector paths (lines, curves, shapes) from a page.
pageLabels() -> Array
Get page label ranges. Returns an array of objects:
| Field | Type | Description |
|---|---|---|
startPage |
number |
First page in this range |
style |
string |
Numbering style |
prefix |
string |
Label prefix |
startValue |
number |
Starting number |
xmpMetadata() -> object | null
Get XMP metadata. Returns null if not present. Object fields include:
| Field | Type | Description |
|---|---|---|
dcTitle |
string | null |
Document title |
dcCreator |
string[] | null |
Creator list |
dcDescription |
string | null |
Description |
xmpCreatorTool |
string | null |
Creator tool |
xmpCreateDate |
string | null |
Creation date |
xmpModifyDate |
string | null |
Modification date |
pdfProducer |
string | null |
PDF producer |
Form Fields
getFormFields() -> Array
Get all form fields with name, type, value, and flags.
| Field | Type | Description |
|---|---|---|
name |
string |
Field name |
fieldType |
string |
Field type (text, checkbox, etc.) |
value |
string |
Current value |
flags |
number |
Field flags |
const fields = doc.getFormFields();
for (const f of fields) {
console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}
hasXfa() -> boolean
Check if the document contains XFA forms.
getFormFieldValue(name) -> any
Get a form field value by name. Returns a string, boolean, or null depending on the field type.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Field name |
setFormFieldValue(name, value) -> void
Set a form field value by name.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Field name |
value |
string | boolean |
New field value |
exportFormData(format?) -> Uint8Array
Export form data as FDF (default) or XFDF.
| Parameter | Type | Default | Description |
|---|---|---|---|
format |
string |
"fdf" |
Export format: "fdf" or "xfdf" |
Editing
Metadata
| Method | Parameters | Description |
|---|---|---|
setTitle(title) |
string |
Set document title |
setAuthor(author) |
string |
Set document author |
setSubject(subject) |
string |
Set document subject |
setKeywords(keywords) |
string |
Set document keywords |
Page Rotation
| Method | Parameters | Description |
|---|---|---|
pageRotation(pageIndex) |
number |
Get current rotation (0, 90, 180, 270) |
setPageRotation(pageIndex, degrees) |
number, number |
Set absolute rotation |
rotatePage(pageIndex, degrees) |
number, number |
Add to current rotation |
rotateAllPages(degrees) |
number |
Rotate all pages |
Page Dimensions
| Method | Parameters | Description |
|---|---|---|
pageMediaBox(pageIndex) |
number |
Get MediaBox [llx, lly, urx, ury] |
setPageMediaBox(pageIndex, llx, lly, urx, ury) |
number, ... |
Set MediaBox |
pageCropBox(pageIndex) |
number |
Get CropBox (may be null) |
setPageCropBox(pageIndex, llx, lly, urx, ury) |
number, ... |
Set CropBox |
cropMargins(left, right, top, bottom) |
number, ... |
Crop all page margins |
Erase / Whiteout
| Method | Parameters | Description |
|---|---|---|
eraseRegion(pageIndex, llx, lly, urx, ury) |
number, ... |
Erase a region |
eraseRegions(pageIndex, rects) |
number, Float32Array |
Erase multiple regions |
clearEraseRegions(pageIndex) |
number |
Clear pending erases |
Annotations & Redaction
| Method | Parameters | Description |
|---|---|---|
flattenPageAnnotations(pageIndex) |
number |
Flatten annotations on page |
flattenAllAnnotations() |
– | Flatten all annotations |
applyPageRedactions(pageIndex) |
number |
Apply redactions on page |
applyAllRedactions() |
– | Apply all redactions |
Form Flattening
| Method | Parameters | Description |
|---|---|---|
flattenForms() |
– | Flatten all form fields into page content |
flattenFormsOnPage(pageIndex) |
number |
Flatten forms on a specific page |
Merge & Embed
mergeFrom(data) -> number
Merge pages from another PDF. Returns the number of pages merged.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
The source PDF file bytes |
embedFile(name, data) -> void
Attach a file to the PDF.
| Parameter | Type | Description |
|---|---|---|
name |
string |
Filename for the attachment |
data |
Uint8Array |
File contents |
Image Manipulation
| Method | Parameters | Description |
|---|---|---|
repositionImage(pageIndex, name, x, y) |
number, string, number, number |
Move image |
resizeImage(pageIndex, name, w, h) |
number, string, number, number |
Resize image |
setImageBounds(pageIndex, name, x, y, w, h) |
number, string, ... |
Set image bounds |
Rendering
| Method | Parameters | Returns | Description |
|---|---|---|---|
renderPage(pageIndex, dpi?) |
number, number |
Uint8Array |
Render a page to PNG bytes |
flattenToImages(dpi?) |
number |
Uint8Array |
Flatten all pages to image-based PDF |
Save
save() -> Uint8Array
Save the edited PDF as bytes. saveToBytes() is available as an alias.
saveEncryptedToBytes(password, ownerPassword?, allowPrint?, allowCopy?, allowModify?, allowAnnotate?) -> Uint8Array
Save with AES-256 encryption.
| Parameter | Type | Default | Description |
|---|---|---|---|
password |
string |
– | User password |
ownerPassword |
string |
user password | Owner password |
allowPrint |
boolean |
true |
Allow printing |
allowCopy |
boolean |
true |
Allow copying |
allowModify |
boolean |
false |
Allow modification |
allowAnnotate |
boolean |
true |
Allow annotations |
free()
Release WASM memory. Always call this when done with the document.
WasmPdf
Factory class for creating new PDFs.
import { WasmPdf } from "pdf-oxide-wasm";
Static Methods
WasmPdf.fromMarkdown(content, title?, author?) -> WasmPdf
Create a PDF from Markdown text.
| Parameter | Type | Default | Description |
|---|---|---|---|
content |
string |
– | Markdown content |
title |
string |
– | Document title |
author |
string |
– | Document author |
WasmPdf.fromHtml(content, title?, author?) -> WasmPdf
Create a PDF from HTML.
WasmPdf.fromText(content, title?, author?) -> WasmPdf
Create a PDF from plain text.
WasmPdf.fromImageBytes(data) -> WasmPdf
Create a single-page PDF from image bytes.
| Parameter | Type | Description |
|---|---|---|
data |
Uint8Array |
Image file bytes (JPEG, PNG) |
WasmPdf.fromMultipleImageBytes(imagesArray) -> WasmPdf
Create a multi-page PDF from multiple images, one page per image.
| Parameter | Type | Description |
|---|---|---|
imagesArray |
Uint8Array[] |
Array of image file bytes |
Instance Methods
toBytes() -> Uint8Array
Get the PDF as bytes.
size -> number
PDF size in bytes (readonly property).
const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
console.log(`PDF size: ${pdf.size} bytes`);
writeFileSync("output.pdf", pdf.toBytes());
Feature Availability
Some features require native dependencies and are not available in WebAssembly:
| Feature | WASM | Notes |
|---|---|---|
| Text extraction | Yes | Full support |
| Structured extraction | Yes | Chars, spans |
| PDF creation | Yes | Markdown, HTML, text, images |
| PDF editing | Yes | Metadata, rotation, dimensions, erase |
| Form fields | Yes | Read, write, export, flatten |
| Search | Yes | Full regex support |
| Encryption | Yes | AES-256 read and write |
| Annotations | Yes | Read, flatten, redact |
| Merge PDFs | Yes | Merge pages from another PDF |
| Embedded files | Yes | Attach files to PDFs |
| Page labels | Yes | Read page label ranges |
| XMP metadata | Yes | Read XMP metadata |
| OCR | No | Requires native ONNX Runtime |
| Digital signatures | No | Requires native crypto libraries |
| Page rendering | No | Requires native tiny-skia |
| Barcodes | No | Requires native rendering |
| Office conversion | No | Requires native LibreOffice |
Error Handling
All methods that can fail throw JavaScript Error objects:
try {
const doc = new WasmPdfDocument(new Uint8Array([0, 1, 2]));
} catch (e) {
console.error(`Failed to open: ${e.message}`);
}
TypeScript
Full type definitions are included in the package:
import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";
const doc: WasmPdfDocument = new WasmPdfDocument(bytes);
const text: string = doc.extractText(0);
const pdf: WasmPdf = WasmPdf.fromMarkdown("# Hello");