Getting Started with PDF Oxide (WASM)
PDF Oxide compiles to WebAssembly for browsers, Deno, Bun, and edge runtimes (Cloudflare Workers, Vercel Edge). The same Rust core that powers the Python, Rust, Node.js, Go, and C# bindings runs directly in any JavaScript environment with near-native performance.
Using Node.js? For server-side Node.js prefer the native
pdf-oxideN-API addon — it’s faster and supports OCR, rendering, and signatures. The WASM build on this page is the right choice for browsers and edge runtimes where native addons can’t load.
Installation
npm install pdf-oxide-wasm
import { WasmPdfDocument, WasmPdf } from "pdf-oxide-wasm";
Quick Start
Node.js
import { readFileSync } from "fs";
import { WasmPdfDocument } from "pdf-oxide-wasm";
const bytes = new Uint8Array(readFileSync("document.pdf"));
const doc = new WasmPdfDocument(bytes);
console.log(`Pages: ${doc.pageCount()}`);
console.log(doc.extractText(0));
doc.free();
Browser
<script type="module">
import init, { WasmPdfDocument } from "pdf-oxide-wasm";
await init();
const response = await fetch("document.pdf");
const bytes = new Uint8Array(await response.arrayBuffer());
const doc = new WasmPdfDocument(bytes);
console.log(`Pages: ${doc.pageCount()}`);
console.log(doc.extractText(0));
doc.free();
</script>
Browser with File Input
<input type="file" id="pdfInput" accept=".pdf" />
<pre id="output"></pre>
<script type="module">
import init, { WasmPdfDocument } from "pdf-oxide-wasm";
await init();
document.getElementById("pdfInput").addEventListener("change", async (e) => {
const file = e.target.files[0];
const bytes = new Uint8Array(await file.arrayBuffer());
const doc = new WasmPdfDocument(bytes);
let result = `Pages: ${doc.pageCount()}\n\n`;
for (let i = 0; i < doc.pageCount(); i++) {
result += `--- Page ${i + 1} ---\n`;
result += doc.extractText(i) + "\n\n";
}
document.getElementById("output").textContent = result;
doc.free();
});
</script>
Text Extraction
Single Page
const doc = new WasmPdfDocument(bytes);
const text = doc.extractText(0);
All Pages
const allText = doc.extractAllText();
Structured Extraction
Get character-level and span-level data with positions and font metadata:
// Character-level data
const chars = doc.extractChars(0);
for (const c of chars) {
console.log(`'${c.char}' at (${c.bbox.x}, ${c.bbox.y}) font=${c.fontName}`);
}
// Span-level data
const spans = doc.extractSpans(0);
for (const span of spans) {
console.log(`"${span.text}" size=${span.fontSize}`);
}
Markdown Conversion
const markdown = doc.toMarkdown(0);
// With options
const md = doc.toMarkdown(0, true, true); // detect_headings, include_images
// All pages
const allMarkdown = doc.toMarkdownAll();
HTML Conversion
const html = doc.toHtml(0);
// All pages
const allHtml = doc.toHtmlAll();
PDF Creation
Create new PDFs from Markdown, HTML, or plain text using WasmPdf:
import { WasmPdf } from "pdf-oxide-wasm";
// From Markdown
const pdf = WasmPdf.fromMarkdown("# Hello World\n\nThis is a PDF.");
const pdfBytes = pdf.toBytes(); // Uint8Array
// From HTML
const invoice = WasmPdf.fromHtml("<h1>Invoice</h1><p>Amount: $42</p>");
// From plain text
const notes = WasmPdf.fromText("Plain text content.");
// Save to file (Node.js)
import { writeFileSync } from "fs";
writeFileSync("output.pdf", pdf.toBytes());
Form Fields
const fields = doc.getFormFields();
for (const f of fields) {
console.log(`${f.name} (${f.fieldType}) = ${f.value}`);
}
// Export form data
const fdfBytes = doc.exportFormData(); // FDF format
const xfdfBytes = doc.exportFormData("xfdf"); // XFDF format
Search
// Search all pages
const results = doc.search("configuration", true); // case_insensitive
for (const r of results) {
console.log(`Found "${r.text}" on page ${r.page}`);
}
// Search single page
const pageResults = doc.searchPage(0, "configuration", true);
Opening from Bytes
The WasmPdfDocument constructor already takes Uint8Array bytes directly — no separate from_bytes method is needed:
// Already works — WasmPdfDocument takes bytes
const doc = new WasmPdfDocument(uint8Array);
Encrypted PDFs
const doc = new WasmPdfDocument(encryptedBytes);
const success = doc.authenticate("password");
if (success) {
console.log(doc.extractText(0));
}
Editing
const doc = new WasmPdfDocument(bytes);
// Metadata
doc.setTitle("Updated Title");
doc.setAuthor("Jane Doe");
// Page rotation
doc.rotatePage(0, 90);
// Save with changes
const edited = doc.save();
// Save with encryption
const encrypted = doc.saveEncryptedToBytes(
"user-password",
"owner-password",
true, // allow_print
true, // allow_copy
false, // allow_modify
true // allow_annotate
);
Memory Management
WASM objects hold Rust memory that must be freed explicitly:
const doc = new WasmPdfDocument(bytes);
try {
const text = doc.extractText(0);
} finally {
doc.free();
}
Feature Availability
Some features require native dependencies and are not available in WebAssembly builds:
| Feature | WASM | Notes |
|---|---|---|
| Text extraction | Yes | Full support |
| PDF creation | Yes | Markdown, HTML, text |
| PDF editing | Yes | Full support |
| Encryption | Yes | AES-256 |
| OCR | No | Requires native ONNX Runtime |
| Digital signatures | No | Requires native crypto libraries |
| Page rendering | No | Requires native tiny-skia |
For OCR or rendering support, use the Python or Rust bindings.
Next Steps
- Python Getting Started – using PDF Oxide from Python
- Rust Getting Started – using PDF Oxide from Rust
- JavaScript API Reference – full WASM API documentation
- Text Extraction – detailed extraction options
- PDF Creation – advanced creation