Skip to content

Node.js API Reference

The pdf-oxide npm package provides a native N-API addon with full TypeScript type definitions. Prebuilt platform binaries ship via per-platform subpackages.

npm install pdf-oxide

For the WASM build targeting browsers / Deno / Bun / edge runtimes, see WASM API Reference. For other languages see Python, Go, C#, or Rust.


Package exports

import {
  PdfDocument,
  Pdf,
  DocumentEditor,
  OcrEngine,
  SearchManager,
  SearchStream,
} from "pdf-oxide";

All classes implement Symbol.dispose where appropriate — use using (Node.js 22+) for automatic cleanup.


PdfDocument

Read-only access to PDF documents.

Constructors

new PdfDocument(path: string)
PdfDocument.openFromBytes(data: Buffer | Uint8Array): PdfDocument
PdfDocument.openWithPassword(path: string, password: string): PdfDocument

Document info

getPageCount(): number
getVersion(): { major: number; minor: number }
hasStructureTree(): boolean
authenticate(password: string): boolean
close(): void

Pages (v0.3.34)

page(index: number): PdfPage           // negative indexing supported
[Symbol.iterator](): Iterator<PdfPage> // for (const p of doc) { ... }

PdfPage is a lightweight handle with cached width / height / rotation and dispatches extraction to the parent document:

class PdfPage {
  readonly index: number;
  readonly width: number;
  readonly height: number;
  readonly rotation: number;

  text(): string;
  markdown(): string;
  html(): string;
  plainText(): string;
  words(): Word[];
  lines(): TextLine[];
  tables(): Table[];
  images(): ImageInfo[];
  paths(): Path[];
  annotations(): AnnotationInfo[];
  fonts(): FontInfo[];
  search(query: string, caseSensitive?: boolean): SearchResult[];
}

Text extraction

extractText(pageIndex: number): string
extractTextAsync(pageIndex: number): Promise<string>
extractAllText(): string
toMarkdown(pageIndex: number, options?: { detectHeadings?: boolean; includeImages?: boolean }): string
toMarkdownAll(): string
toHtml(pageIndex: number): string
toHtmlAll(): string
toPlainText(pageIndex: number): string

Structured extraction

extractChars(pageIndex: number): Char[]
extractSpans(pageIndex: number): Span[]
extractWords(pageIndex: number, options?: WordOptions): Word[]
extractTextLines(pageIndex: number, options?: LineOptions): TextLine[]
extractTables(pageIndex: number, config?: TableDetectionConfig): Table[]
extractPaths(pageIndex: number): Path[]
pageLayoutParams(pageIndex: number): ExtractionProfile

WordOptions / LineOptions accept wordGapThreshold, lineGapThreshold, and a profile string (see text extraction).

Region-based

extractTextInRect(pageIndex: number, x: number, y: number, width: number, height: number): string
extractWordsInRect(pageIndex: number, x: number, y: number, width: number, height: number): Word[]
extractImagesInRect(pageIndex: number, x: number, y: number, width: number, height: number): ImageInfo[]

Images & resources

extractImages(pageIndex: number): ImageInfo[]
getFonts(pageIndex: number): FontInfo[]
getAnnotations(pageIndex: number): AnnotationInfo[]
getFormFields(): FormField[]
getPageElements(pageIndex: number): Element[]
pageInfo(pageIndex: number): PageInfo
searchPage(pageIndex: number, query: string, options?: { caseSensitive?: boolean }): SearchResult[]
searchAll(query: string, options?: { caseSensitive?: boolean }): SearchResult[]

Rendering (optional feature)

renderPage(pageIndex: number, format: "png" | "jpeg"): Buffer
renderPageZoom(pageIndex: number, zoom: number, format: "png" | "jpeg"): Buffer
renderThumbnail(pageIndex: number, width: number, format: "png" | "jpeg"): Buffer

Pdf — creation

Pdf.fromMarkdown(markdown: string): Pdf
Pdf.fromHtml(html: string): Pdf
Pdf.fromText(text: string): Pdf
Pdf.fromImage(path: string): Pdf
Pdf.fromImageBytes(data: Buffer | Uint8Array): Pdf

save(path: string): void
saveAsync(path: string): Promise<void>
toBytes(): Buffer
close(): void

DocumentEditor

DocumentEditor.open(path: string): DocumentEditor
DocumentEditor.openFromBytes(data: Buffer | Uint8Array): DocumentEditor

Metadata

setTitle(title: string): void
setAuthor(author: string): void
setSubject(subject: string): void
setKeywords(keywords: string): void
applyMetadata(meta: Metadata): void

Page operations

rotatePage(pageIndex: number, degrees: 0 | 90 | 180 | 270): void
deletePage(pageIndex: number): void
movePage(from: number, to: number): void
cropMargins(left: number, bottom: number, right: number, top: number): void
eraseRegion(pageIndex: number, x: number, y: number, width: number, height: number): void

Annotations & forms

flattenAnnotations(pageIndex: number): void
flattenAllAnnotations(): void
flattenForms(): void
setFormFieldValue(name: string, value: string): void

Merging

mergeFrom(path: string): number

Save

save(path: string): void
saveAsync(path: string): Promise<void>
saveEncrypted(path: string, userPassword: string, ownerPassword: string): void
toBytes(): Buffer
close(): void

OcrEngine (feature ocr)

new OcrEngine()
pageNeedsOcr(doc: PdfDocument, pageIndex: number): boolean
extractText(doc: PdfDocument, pageIndex: number): string
close(): void

Build with --features ocr (see OCR guide).


Streams

new SearchManager(doc: PdfDocument)
new SearchStream(manager: SearchManager, query: string, options?: { caseSensitive?: boolean })
// SearchStream is a standard Node.js Readable in object mode.

See the Node.js streams guide for SearchStream, PageIteratorStream, and TableStream patterns.


Types

interface Char {
  char: string;
  x: number;
  y: number;
  fontSize: number;
  fontName: string;
  bbox: [number, number, number, number];
}

interface Span {
  text: string;
  fontName: string;
  fontSize: number;
  bbox: [number, number, number, number];
}

interface Word {
  text: string;
  x: number;
  y: number;
  width: number;
  height: number;
}

interface TextLine {
  text: string;
  y: number;
  spans: Span[];
}

interface ImageInfo {
  width: number;
  height: number;
  format: "png" | "jpeg" | "tiff";
  colorspace: "rgb" | "gray" | "cmyk" | "indexed";
  bitsPerComponent: number;
  data: Buffer;
}

interface FontInfo {
  name: string;
  type: string;
  encoding: string;
  isEmbedded: boolean;
  isSubset: boolean;
  size: number;
}

interface AnnotationInfo {
  type: string;
  subtype: string;
  content: string;
  x: number; y: number;
  width: number; height: number;
  author: string;
  linkUri?: string;
}

interface FormField {
  name: string;
  fieldType: string;
  value: string;
  pageIndex: number;
}

interface SearchResult {
  text: string;
  page: number;
  x: number; y: number;
  width: number; height: number;
}

interface PageInfo {
  width: number;
  height: number;
  rotation: 0 | 90 | 180 | 270;
  mediaBox: Rect;
  cropBox: Rect;
}

interface Metadata {
  title?: string;
  author?: string;
  subject?: string;
  keywords?: string;
  producer?: string;
  creationDate?: string;
}

interface ExtractionProfile {
  wordGapThreshold: number;
  lineGapThreshold: number;
  columnCount: number;
}

Error handling

All methods throw on failure. Catch with try/catch and inspect err.message:

try {
  const text = doc.extractText(0);
} catch (err) {
  console.error(`Extraction failed: ${err.message}`);
}

Async method suffix

Every sync method listed above has an *Async variant returning a PromiseextractTextextractTextAsync, savesaveAsync, etc. The async variants dispatch to the libuv thread pool and do not block the event loop. See the async guide.

Thread safety

PdfDocument is Send + Sync on the Rust side — safe to share across Node.js Worker threads. See the concurrency guide.

Generated types

TypeScript definitions ship with the package at node_modules/pdf-oxide/index.d.ts — the canonical source of truth for types, including any fields added after this page was last updated.