What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Image Manipulation

PDF Oxide provides two levels of image manipulation: low-level operations via DocumentEditor for repositioning and resizing images by their XObject name, and DOM-level access via PdfPage for querying image metadata. Both approaches work with images already embedded in the PDF.

Binding coverage. Image extraction (ExtractImages, raw bytes + bounding boxes) is available in all five bindings. Image manipulation (reposition, resize, replace) is currently exposed in Python, Rust, and WASM. The Go and C# public APIs do not yet expose the low-level image editor operations — use the Rust CLI or call through Python/Rust/WASM.

Fixed in v0.3.23: Images and form XObjects are no longer stripped on save. Earlier versions of write_full_to_writer only serialised Font resources from the page Resources dictionary, silently dropping XObject (images, form XObjects) and ExtGState entries. The serializer now preserves XObject + ExtGState alongside Fonts. Previously-affected examples (including create_pdf_with_images) now produce correct output.

Getting Page Images

DocumentEditor: Low-Level Image Info

Retrieve detailed placement information for all images on a page, including XObject names and transformation matrices.

from pdf_oxide import PdfDocument

doc = PdfDocument("brochure.pdf")
images = doc.page_images(0)

for img in images:
    print(f"Name: {img['name']}")
    print(f"Position: ({img['x']:.1f}, {img['y']:.1f})")
    print(f"Size: {img['width']:.1f} x {img['height']:.1f}")
    print(f"Matrix: {img['matrix']}")

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);
const images = doc.pageImages(0);

for (const img of images) {
  console.log(`Name: ${img.name}`);
  console.log(`Position: (${img.x}, ${img.y})`);
  console.log(`Size: ${img.width} x ${img.height}`);
}
doc.free();

use pdf_oxide::editor::DocumentEditor;

let mut editor = DocumentEditor::open("brochure.pdf")?;
let images = editor.get_page_images(0)?;

for img in &images {
    println!("Name: {}", img.name);
    println!("Bounds: {:?}", img.bounds);  // [x, y, width, height]
    println!("Matrix: {:?}", img.matrix);  // [a, b, c, d, e, f]
}

The returned ImageInfo struct contains:

Field	Type	Description
`name`	`String`	XObject name (e.g., `"Im0"`, `"Image1"`)
`bounds`	`[f32; 4]`	Position and size `[x, y, width, height]`
`matrix`	`[f32; 6]`	Full transformation matrix `[a, b, c, d, e, f]`

PdfPage: DOM-Level Image Info

The DOM API provides richer metadata about each image.

doc = PdfDocument("brochure.pdf")
page = doc.page(0)

for img in page.find_images():
    print(f"BBox: {img.bbox}")
    print(f"Format: {img.format}")
    print(f"Dimensions: {img.dimensions}")

let mut doc = Pdf::open("brochure.pdf")?;
let page = doc.page(0)?;

for img in page.find_images() {
    println!("BBox: {:?}", img.bbox());
    println!("Format: {:?}", img.format());
    println!("Dimensions: {:?}", img.dimensions());
    println!("Aspect ratio: {:.2}", img.aspect_ratio());
    println!("Grayscale: {}", img.is_grayscale());

    if let Some(alt) = img.alt_text() {
        println!("Alt text: {}", alt);
    }
    if let Some((h_dpi, v_dpi)) = img.resolution() {
        println!("Resolution: {:.0} x {:.0} DPI", h_dpi, v_dpi);
    }
}

Repositioning Images

Move an image to a new position on the page without changing its size.

doc = PdfDocument("input.pdf")
images = doc.page_images(0)

# Move the first image to position (100, 500)
doc.reposition_image(0, images[0]["name"], 100, 500)
doc.save("moved.pdf")

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);
const images = doc.pageImages(0);

// Move the first image to position (100, 500)
doc.repositionImage(0, images[0].name, 100, 500);
const output = doc.save();
doc.free();

let mut editor = DocumentEditor::open("input.pdf")?;
let images = editor.get_page_images(0)?;

// Move the first image
editor.reposition_image(0, &images[0].name, 100.0, 500.0)?;
editor.save("moved.pdf")?;

Resizing Images

Change the dimensions of an image without moving its position.

doc = PdfDocument("input.pdf")
images = doc.page_images(0)

# Resize the first image to 300x200 points
doc.resize_image(0, images[0]["name"], 300, 200)
doc.save("resized.pdf")

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);
const images = doc.pageImages(0);

// Resize the first image to 300x200 points
doc.resizeImage(0, images[0].name, 300, 200);
const output = doc.save();
doc.free();

let mut editor = DocumentEditor::open("input.pdf")?;
let images = editor.get_page_images(0)?;

editor.resize_image(0, &images[0].name, 300.0, 200.0)?;
editor.save("resized.pdf")?;

Setting Full Image Bounds

Set both position and size in a single operation.

doc = PdfDocument("input.pdf")
images = doc.page_images(0)

# Set position (72, 600) and size (468, 200)
doc.set_image_bounds(0, images[0]["name"], 72, 600, 468, 200)
doc.save("adjusted.pdf")

let mut editor = DocumentEditor::open("input.pdf")?;
let images = editor.get_page_images(0)?;

// x, y, width, height
editor.set_image_bounds(0, &images[0].name, 72.0, 600.0, 468.0, 200.0)?;
editor.save("adjusted.pdf")?;

Managing Image Modifications

Clear Modifications

Discard all pending image modifications for a page before saving.

doc.clear_image_modifications(0)

editor.clear_image_modifications(0);

Check for Pending Modifications

if doc.has_image_modifications(0):
    print("Page 0 has pending image changes")

if editor.has_image_modifications(0) {
    println!("Page 0 has pending image changes");
}

PdfImage DOM API (Rust)

The DOM-level PdfImage provides rich metadata for each image found on a page.

Method	Returns	Description
`id()`	`ElementId`	Unique element identifier
`bbox()`	`Rect`	Position and size on the page
`format()`	`ImageFormat`	Image format (JPEG, PNG, etc.)
`dimensions()`	`(u32, u32)`	Width and height in pixels
`aspect_ratio()`	`f32`	Width / height ratio
`is_grayscale()`	`bool`	True if grayscale image
`alt_text()`	`Option<&str>`	Accessibility alt text
`set_alt_text(text)`	`()`	Set accessibility alt text
`resolution()`	`Option<(f32, f32)>`	DPI as (horizontal, vertical)
`horizontal_dpi()`	`Option<f32>`	Horizontal DPI
`vertical_dpi()`	`Option<f32>`	Vertical DPI
`is_high_resolution()`	`bool`	Resolution >= 300 DPI
`is_medium_resolution()`	`bool`	Resolution between 150-300 DPI
`is_low_resolution()`	`bool`	Resolution < 150 DPI

Find Images in a Region

use pdf_oxide::geometry::Rect;

let page = doc.page(0)?;

// Find images in the top half of the page
let region = Rect::new(0.0, 396.0, 612.0, 396.0);
let top_images = page.find_images_in_region(region);
println!("Found {} images in top half", top_images.len());

Set Accessibility Alt Text

let mut page = doc.page(0)?;
let images = page.find_images();

for img in &images {
    if img.alt_text().is_none() {
        page.set_image_alt_text(img.id(), "Descriptive alt text")?;
    }
}

doc.save_page(page)?;

Full API Reference

DocumentEditor Image Methods

Method	Returns	Description
`get_page_images(page)`	`Result<Vec<ImageInfo>>`	List all images on a page
`reposition_image(page, name, x, y)`	`Result<()>`	Move image to new position
`resize_image(page, name, w, h)`	`Result<()>`	Change image dimensions
`set_image_bounds(page, name, x, y, w, h)`	`Result<()>`	Set position and size
`clear_image_modifications(page)`	`()`	Discard pending changes
`has_image_modifications(page)`	`bool`	Check for pending changes

Advanced Example: Center All Images

use pdf_oxide::editor::DocumentEditor;

let mut editor = DocumentEditor::open("input.pdf")?;
let count = editor.current_page_count();

for page_idx in 0..count {
    let media_box = editor.get_page_media_box(page_idx)?;
    let page_width = media_box[2] - media_box[0];

    let images = editor.get_page_images(page_idx)?;
    for img in &images {
        let img_width = img.bounds[2];
        let centered_x = (page_width - img_width) / 2.0;

        editor.reposition_image(page_idx, &img.name, centered_x, img.bounds[1])?;
    }
}

editor.save("centered.pdf")?;

Advanced Example: Scale to Fit Width

from pdf_oxide import PdfDocument

doc = PdfDocument("photos.pdf")

for page_idx in range(doc.page_count()):
    media_box = doc.page_media_box(page_idx)
    page_width = media_box[2] - media_box[0]
    margin = 72  # 1 inch

    images = doc.page_images(page_idx)
    for img in images:
        target_width = page_width - 2 * margin
        scale = target_width / img["width"]
        new_height = img["height"] * scale

        doc.set_image_bounds(
            page_idx, img["name"],
            margin, img["y"],
            target_width, new_height
        )

doc.save("fitted.pdf")

Editing Overview – opening, metadata, and save workflow
Text Editing – modify text around images
Page Operations – crop and resize pages
Annotation Editing – add captions or notes near images