What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Getting Started with PDF Oxide (Rust)

PDF Oxide is the fastest Rust PDF crate with built-in text extraction — 0.8ms mean, 100% pass rate on 3,830 PDFs. One library for extracting, creating, and editing PDFs.

Installation

Add pdf_oxide to your Cargo.toml:

[dependencies]
pdf_oxide = "0.3"

Feature Flags

Enable only the capabilities you need:

# Default -- text extraction, creation, editing
pdf_oxide = "0.3"

# Page rendering to images
pdf_oxide = { version = "0.3", features = ["rendering"] }

# Barcode generation
pdf_oxide = { version = "0.3", features = ["barcodes"] }

# Digital signatures
pdf_oxide = { version = "0.3", features = ["signatures"] }

# Office document conversion (DOCX, XLSX, PPTX)
pdf_oxide = { version = "0.3", features = ["office"] }

# Everything
pdf_oxide = { version = "0.3", features = ["full"] }

Opening a PDF

Use PdfDocument::open() to load a file and inspect its metadata.

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("research-paper.pdf")?;
println!("Pages: {}", doc.page_count());
println!("PDF version: {}", doc.version());

Text Extraction

Plain Text

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;
let text = doc.extract_text(0)?;
println!("{text}");

Text Spans

extract_spans() returns a Vec<TextSpan> with font metadata for each run of identically-styled text.

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("paper.pdf")?;
let spans = doc.extract_spans(0)?;

for span in &spans {
    println!("'{}' at ({:.1}, {:.1}) font={} size={:.1}",
        span.text, span.x, span.y, span.font_name, span.font_size);
}

TextSpan fields:

Field	Type	Description
`text`	`String`	The text content
`x`	`f64`	Horizontal position in points
`y`	`f64`	Vertical position in points
`font_name`	`String`	PostScript font name
`font_size`	`f64`	Font size in points
`bbox`	`Rect`	Bounding rectangle

Character-Level Extraction

extract_chars() returns a Vec<TextChar> with precise positioning for every character.

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("paper.pdf")?;
let chars = doc.extract_chars(0)?;

for ch in chars.iter().take(10) {
    println!("'{}' at ({:.1}, {:.1}) size={:.1} font={}",
        ch.char, ch.x, ch.y, ch.font_size, ch.font_name);
}

TextChar fields:

Field	Type	Description
`char`	`char`	The Unicode character
`x`	`f64`	Horizontal position in points
`y`	`f64`	Vertical position in points
`font_size`	`f64`	Font size in points
`font_name`	`String`	PostScript font name
`bbox`	`Rect`	Bounding rectangle

Markdown Conversion

Convert a page to Markdown with configurable options.

use pdf_oxide::PdfDocument;
use pdf_oxide::converters::ConversionOptions;

let doc = PdfDocument::open("paper.pdf")?;
let options = ConversionOptions { detect_headings: true, ..Default::default() };
let md = doc.to_markdown(0, &options)?;
println!("{md}");

HTML Conversion

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("paper.pdf")?;
let html = doc.to_html(0)?;
println!("{html}");

Image Extraction

extract_images() returns metadata and raw data for every image on a page, including images in content streams and nested Form XObjects.

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("brochure.pdf")?;
let images = doc.extract_images(0)?;

for (i, img) in images.iter().enumerate() {
    println!("Image {i}: {}x{} {} {}bpc ({} bytes)",
        img.width, img.height, img.color_space,
        img.bits_per_component, img.data.len());
}

Write images directly to disk with extract_images_to_files():

let doc = PdfDocument::open("brochure.pdf")?;
let paths = doc.extract_images_to_files(0, "output_dir")?;
for path in &paths {
    println!("Saved: {}", path.display());
}

PDF Creation

Factory Methods

The Pdf type provides high-level factory methods.

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::from_markdown("# Hello World\n\nThis is a PDF.")?;
pdf.save("output.pdf")?;

let mut pdf = Pdf::from_html("<h1>Invoice</h1><p>Amount: $42</p>")?;
pdf.save("invoice.pdf")?;

let mut pdf = Pdf::from_text("Plain text content.")?;
pdf.save("notes.pdf")?;

let mut pdf = Pdf::from_image("scan.jpg")?;
pdf.save("scan.pdf")?;

PdfBuilder Fluent API

For full control over metadata, page size, and margins:

use pdf_oxide::api::PdfBuilder;
use pdf_oxide::writer::PageSize;

let mut pdf = PdfBuilder::new()
    .title("Annual Report")
    .author("Acme Corp")
    .page_size(PageSize::A4)
    .margins(72.0, 72.0, 72.0, 72.0)
    .font_size(11.0)
    .from_markdown("# Annual Report\n\n...")?;

pdf.save("annual-report.pdf")?;

DocumentBuilder Low-Level API

For pixel-level placement of text, shapes, and images:

use pdf_oxide::writer::DocumentBuilder;

let mut builder = DocumentBuilder::new();
builder.add_page(612.0, 792.0)
    .text("Hello, world!", 72.0, 720.0, 12.0)
    .rect(100.0, 600.0, 200.0, 50.0)
    .image_at("logo.png", 400.0, 700.0, 100.0, 50.0)?;

builder.save("custom.pdf")?;

Search

Search for text across the document or with fine-grained options.

use pdf_oxide::api::Pdf;

let pdf = Pdf::open("manual.pdf")?;

// Simple search across all pages
let results = pdf.search("configuration")?;
for r in &results {
    println!("Page {}: '{}' at ({:.0}, {:.0})", r.page, r.text, r.x, r.y);
}

use pdf_oxide::api::{Pdf, SearchOptions};

let pdf = Pdf::open("manual.pdf")?;

let opts = SearchOptions {
    case_sensitive: false,
    whole_word: true,
    max_results: Some(50),
    ..Default::default()
};
let results = pdf.search_with_options("configuration", &opts)?;

Editing

DocumentEditor

Open an existing PDF for structural edits like page rotation and form field manipulation.

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open_editor("form-template.pdf")?;

// Rotate a page
pdf.rotate_page(0, 90)?;

// Add a form field
pdf.add_text_field("name", [100.0, 700.0, 300.0, 720.0])?;
pdf.add_checkbox("agree", [100.0, 650.0, 120.0, 670.0], false)?;

pdf.save("modified.pdf")?;

DOM-Like Page Editing

Navigate page elements and modify text in place.

use pdf_oxide::api::Pdf;

let mut pdf = Pdf::open("document.pdf")?;
let mut page = pdf.page(0)?;

// Find text elements
for t in page.find_text_containing("Draft") {
    println!("Found '{}' at {:?}", t.text(), t.bbox());
}

// Replace text
let matches = page.find_text_containing("Draft");
for t in &matches {
    page.set_text(t.id(), "Final")?;
}

pdf.save_page(page)?;
pdf.save("updated.pdf")?;

Error Handling

All fallible operations return Result<T, PdfError>. The PdfError enum covers the main failure modes.

use pdf_oxide::PdfDocument;
use pdf_oxide::PdfError;

fn extract(path: &str) -> Result<String, PdfError> {
    let doc = PdfDocument::open(path)?;
    doc.extract_text(0)
}

match extract("file.pdf") {
    Ok(text) => println!("{text}"),
    Err(PdfError::Io(e)) => eprintln!("I/O error: {e}"),
    Err(PdfError::Parse(msg)) => eprintln!("Parse error: {msg}"),
    Err(PdfError::Password) => eprintln!("Password required"),
    Err(PdfError::PageOutOfRange { index, count }) => {
        eprintln!("Page {index} does not exist ({count} pages total)");
    }
    Err(e) => eprintln!("Error: {e}"),
}

PdfError variants:

Variant	Description
`Io`	File system or I/O failure
`Parse`	Malformed PDF structure
`Password`	Document is encrypted and no password was given
`PageOutOfRange`	Requested page index exceeds page count

Next Steps

Python Getting Started – using PDF Oxide from Python
Text Extraction – detailed extraction options and recipes
PDF Creation – advanced creation with PdfBuilder, encryption, and metadata
Editing – modifying existing PDFs, annotations, and form fields
API Reference – full API documentation