Skip to content

Getting Started with PDF Oxide (WASM)

PDF Oxide compiles to WebAssembly for browsers, Deno, Bun, and edge runtimes (Cloudflare Workers, Vercel Edge). The same Rust core that powers the Python, Rust, Node.js, Go, and C# bindings runs directly in any JavaScript environment with near-native performance.

Using Node.js? For server-side Node.js prefer the native pdf-oxide N-API addon — it’s faster and supports OCR, rendering, and signatures. The WASM build on this page is the right choice for browsers and edge runtimes where native addons can’t load.

Installation

npm install pdf-oxide-wasm
import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";

Quick Start

Node.js

import { readFileSync } from "fs";
import { WasmPdfDocument } from "pdf-oxide-wasm";

const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);

console.log(`Pages: ${doc.pageCount()}`);
console.log(doc.extractText(0));

doc.free();

Browser

<script type="module">
import init, { WasmPdfDocument } from "pdf-oxide-wasm";

await init();

const response = await fetch("document.pdf");
const bytes = new Uint8Array(await response.arrayBuffer());
const doc = new WasmPdfDocument(bytes);

console.log(`Pages: ${doc.pageCount()}`);
console.log(doc.extractText(0));
doc.free();
</script>

Browser with File Input

<input type="file" id="pdfInput" accept=".pdf" />
<pre id="output"></pre>

<script type="module">
import init, { WasmPdfDocument } from "pdf-oxide-wasm";
await init();

document.getElementById("pdfInput").addEventListener("change", async (e) => {
  const file = e.target.files[0];
  const bytes = new Uint8Array(await file.arrayBuffer());
  const doc = new WasmPdfDocument(bytes);

  let result = `Pages: ${doc.pageCount()}\n\n`;
  for (let i = 0; i < doc.pageCount(); i++) {
    result += `--- Page ${i + 1} ---\n`;
    result += doc.extractText(i) + "\n\n";
  }

  document.getElementById("output").textContent = result;
  doc.free();
});
</script>

Text Extraction

Single Page

const doc = new WasmPdfDocument(bytes);
const text = doc.extractText(0);

All Pages

const allText = doc.extractAllText();

Structured Extraction

Get character-level and span-level data with positions and font metadata:

// Character-level data
const chars = doc.extractChars(0);
for (const c of chars) {
  console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y}) font=${c.fontName}`);
}

// Span-level data
const spans = doc.extractSpans(0);
for (const span of spans) {
  console.log(`"${span.text}" size=${span.fontSize}`);
}

Markdown Conversion

const markdown = doc.toMarkdown(0);

// With options
const md = doc.toMarkdown(0, true, true); // detect_headings, include_images

// All pages
const allMarkdown = doc.toMarkdownAll();

HTML Conversion

const html = doc.toHtml(0);

// All pages
const allHtml = doc.toHtmlAll();

PDF Creation

Create new PDFs from Markdown, HTML, or plain text using WasmPdf:

import { WasmPdf } from "pdf-oxide-wasm";

// From Markdown
const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
const pdfBytes = pdf.toBytes(); // Uint8Array

// From HTML
const invoice = WasmPdf.fromHtml("<h1>Invoice</h1><p>Amount: $42</p>");

// From plain text
const notes = WasmPdf.fromText("Plain text content.");

// Save to file (Node.js)
import { writeFileSync } from "fs";
writeFileSync("output.pdf", pdf.toBytes());

Form Fields

const fields = doc.getFormFields();
for (const f of fields) {
  console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}

// Export form data
const fdfBytes = doc.exportFormData();        // FDF format
const xfdfBytes = doc.exportFormData("xfdf"); // XFDF format
// Search all pages
const results = doc.search("configuration", true); // case_insensitive
for (const r of results) {
  console.log(`Found "${r.text}" on page ${r.page}`);
}

// Search single page
const pageResults = doc.searchPage(0, "configuration", true);

Opening from Bytes

The WasmPdfDocument constructor already takes Uint8Array bytes directly — no separate from_bytes method is needed:

// Already works — WasmPdfDocument takes bytes
const doc = new WasmPdfDocument(uint8Array);

Encrypted PDFs

const doc = new WasmPdfDocument(encryptedBytes);
const success = doc.authenticate("password");
if (success) {
  console.log(doc.extractText(0));
}

Editing

const doc = new WasmPdfDocument(bytes);

// Metadata
doc.setTitle("Updated Title");
doc.setAuthor("Jane Doe");

// Page rotation
doc.rotatePage(0, 90);

// Save with changes
const edited = doc.save();

// Save with encryption
const encrypted = doc.saveEncryptedToBytes(
  "user-password",
  "owner-password",
  true,   // allow_print
  true,   // allow_copy
  false,  // allow_modify
  true    // allow_annotate
);

Memory Management

WASM objects hold Rust memory that must be freed explicitly:

const doc = new WasmPdfDocument(bytes);
try {
  const text = doc.extractText(0);
} finally {
  doc.free();
}

Feature Availability

Some features require native dependencies and are not available in WebAssembly builds:

Feature WASM Notes
Text extraction Yes Full support
PDF creation Yes Markdown, HTML, text
PDF editing Yes Full support
Encryption Yes AES-256
OCR No Requires native ONNX Runtime
Digital signatures No Requires native crypto libraries
Page rendering No Requires native tiny-skia

For OCR or rendering support, use the Python or Rust bindings.

Next Steps