Node.js API Reference
The pdf-oxide npm package provides a native N-API addon with full TypeScript type definitions. Prebuilt platform binaries ship via per-platform subpackages.
npm install pdf-oxide
For the WASM build targeting browsers / Deno / Bun / edge runtimes, see WASM API Reference. For other languages see Python, Go, C#, or Rust.
Package exports
import {
PdfDocument,
Pdf,
DocumentEditor,
OcrEngine,
SearchManager,
SearchStream,
} from "pdf-oxide";
All classes implement Symbol.dispose where appropriate — use using (Node.js 22+) for automatic cleanup.
PdfDocument
Read-only access to PDF documents.
Constructors
new PdfDocument(path: string)
PdfDocument.openFromBytes(data: Buffer | Uint8Array): PdfDocument
PdfDocument.openWithPassword(path: string, password: string): PdfDocument
Document info
getPageCount(): number
getVersion(): { major: number; minor: number }
hasStructureTree(): boolean
authenticate(password: string): boolean
close(): void
Pages (v0.3.34)
page(index: number): PdfPage // negative indexing supported
[Symbol.iterator](): Iterator<PdfPage> // for (const p of doc) { ... }
PdfPage is a lightweight handle with cached width / height / rotation and dispatches extraction to the parent document:
class PdfPage {
readonly index: number;
readonly width: number;
readonly height: number;
readonly rotation: number;
text(): string;
markdown(): string;
html(): string;
plainText(): string;
words(): Word[];
lines(): TextLine[];
tables(): Table[];
images(): ImageInfo[];
paths(): Path[];
annotations(): AnnotationInfo[];
fonts(): FontInfo[];
search(query: string, caseSensitive?: boolean): SearchResult[];
}
Text extraction
extractText(pageIndex: number): string
extractTextAsync(pageIndex: number): Promise<string>
extractAllText(): string
toMarkdown(pageIndex: number, options?: { detectHeadings?: boolean; includeImages?: boolean }): string
toMarkdownAll(): string
toHtml(pageIndex: number): string
toHtmlAll(): string
toPlainText(pageIndex: number): string
Structured extraction
extractChars(pageIndex: number): Char[]
extractSpans(pageIndex: number): Span[]
extractWords(pageIndex: number, options?: WordOptions): Word[]
extractTextLines(pageIndex: number, options?: LineOptions): TextLine[]
extractTables(pageIndex: number, config?: TableDetectionConfig): Table[]
extractPaths(pageIndex: number): Path[]
pageLayoutParams(pageIndex: number): ExtractionProfile
WordOptions / LineOptions accept wordGapThreshold, lineGapThreshold, and a profile string (see text extraction).
Region-based
extractTextInRect(pageIndex: number, x: number, y: number, width: number, height: number): string
extractWordsInRect(pageIndex: number, x: number, y: number, width: number, height: number): Word[]
extractImagesInRect(pageIndex: number, x: number, y: number, width: number, height: number): ImageInfo[]
Images & resources
extractImages(pageIndex: number): ImageInfo[]
getFonts(pageIndex: number): FontInfo[]
getAnnotations(pageIndex: number): AnnotationInfo[]
getFormFields(): FormField[]
getPageElements(pageIndex: number): Element[]
pageInfo(pageIndex: number): PageInfo
Search
searchPage(pageIndex: number, query: string, options?: { caseSensitive?: boolean }): SearchResult[]
searchAll(query: string, options?: { caseSensitive?: boolean }): SearchResult[]
Rendering (optional feature)
renderPage(pageIndex: number, format: "png" | "jpeg"): Buffer
renderPageZoom(pageIndex: number, zoom: number, format: "png" | "jpeg"): Buffer
renderThumbnail(pageIndex: number, width: number, format: "png" | "jpeg"): Buffer
Pdf — creation
Pdf.fromMarkdown(markdown: string): Pdf
Pdf.fromHtml(html: string): Pdf
Pdf.fromText(text: string): Pdf
Pdf.fromImage(path: string): Pdf
Pdf.fromImageBytes(data: Buffer | Uint8Array): Pdf
save(path: string): void
saveAsync(path: string): Promise<void>
toBytes(): Buffer
close(): void
DocumentEditor
DocumentEditor.open(path: string): DocumentEditor
DocumentEditor.openFromBytes(data: Buffer | Uint8Array): DocumentEditor
Metadata
setTitle(title: string): void
setAuthor(author: string): void
setSubject(subject: string): void
setKeywords(keywords: string): void
applyMetadata(meta: Metadata): void
Page operations
rotatePage(pageIndex: number, degrees: 0 | 90 | 180 | 270): void
deletePage(pageIndex: number): void
movePage(from: number, to: number): void
cropMargins(left: number, bottom: number, right: number, top: number): void
eraseRegion(pageIndex: number, x: number, y: number, width: number, height: number): void
Annotations & forms
flattenAnnotations(pageIndex: number): void
flattenAllAnnotations(): void
flattenForms(): void
setFormFieldValue(name: string, value: string): void
Merging
mergeFrom(path: string): number
Save
save(path: string): void
saveAsync(path: string): Promise<void>
saveEncrypted(path: string, userPassword: string, ownerPassword: string): void
toBytes(): Buffer
close(): void
OcrEngine (feature ocr)
new OcrEngine()
pageNeedsOcr(doc: PdfDocument, pageIndex: number): boolean
extractText(doc: PdfDocument, pageIndex: number): string
close(): void
Build with --features ocr (see OCR guide).
Streams
new SearchManager(doc: PdfDocument)
new SearchStream(manager: SearchManager, query: string, options?: { caseSensitive?: boolean })
// SearchStream is a standard Node.js Readable in object mode.
See the Node.js streams guide for SearchStream, PageIteratorStream, and TableStream patterns.
Types
interface Char {
char: string;
x: number;
y: number;
fontSize: number;
fontName: string;
bbox: [number, number, number, number];
}
interface Span {
text: string;
fontName: string;
fontSize: number;
bbox: [number, number, number, number];
}
interface Word {
text: string;
x: number;
y: number;
width: number;
height: number;
}
interface TextLine {
text: string;
y: number;
spans: Span[];
}
interface ImageInfo {
width: number;
height: number;
format: "png" | "jpeg" | "tiff";
colorspace: "rgb" | "gray" | "cmyk" | "indexed";
bitsPerComponent: number;
data: Buffer;
}
interface FontInfo {
name: string;
type: string;
encoding: string;
isEmbedded: boolean;
isSubset: boolean;
size: number;
}
interface AnnotationInfo {
type: string;
subtype: string;
content: string;
x: number; y: number;
width: number; height: number;
author: string;
linkUri?: string;
}
interface FormField {
name: string;
fieldType: string;
value: string;
pageIndex: number;
}
interface SearchResult {
text: string;
page: number;
x: number; y: number;
width: number; height: number;
}
interface PageInfo {
width: number;
height: number;
rotation: 0 | 90 | 180 | 270;
mediaBox: Rect;
cropBox: Rect;
}
interface Metadata {
title?: string;
author?: string;
subject?: string;
keywords?: string;
producer?: string;
creationDate?: string;
}
interface ExtractionProfile {
wordGapThreshold: number;
lineGapThreshold: number;
columnCount: number;
}
Error handling
All methods throw on failure. Catch with try/catch and inspect err.message:
try {
const text = doc.extractText(0);
} catch (err) {
console.error(`Extraction failed: ${err.message}`);
}
Async method suffix
Every sync method listed above has an *Async variant returning a Promise — extractText → extractTextAsync, save → saveAsync, etc. The async variants dispatch to the libuv thread pool and do not block the event loop. See the async guide.
Thread safety
PdfDocument is Send + Sync on the Rust side — safe to share across Node.js Worker threads. See the concurrency guide.
Generated types
TypeScript definitions ship with the package at node_modules/pdf-oxide/index.d.ts — the canonical source of truth for types, including any fields added after this page was last updated.