What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

PDF Oxide vs PyMuPDF

PDF Oxide is a faster, MIT-licensed alternative to PyMuPDF. If you’re evaluating PyMuPDF for a commercial project or looking to replace it due to AGPL licensing, this page covers the key differences.

Why Developers Switch from PyMuPDF

Licensing. PyMuPDF uses MuPDF under AGPL-3.0. If you distribute software that includes PyMuPDF — including SaaS, web apps, and Docker containers — your code must be open-sourced under AGPL or you must buy a commercial license from Artifex. PDF Oxide is MIT licensed with no restrictions.

Speed. PDF Oxide extracts text at 0.8ms mean vs PyMuPDF’s 4.6ms — 5.8× faster on 3,830 PDFs.

Reliability. PDF Oxide achieves 100% pass rate on the same corpus where PyMuPDF passes 99.3% (27 failures on valid PDFs).

Quick Comparison

	PDF Oxide	PyMuPDF
License	MIT	AGPL-3.0
Mean extraction time	0.8ms	4.6ms
Pass rate (3,830 PDFs)	100%	99.3%
Text extraction	Yes	Yes
Character positions	Yes	Yes
Image extraction	Yes	Yes
Form fields	Read + Write	Read + Write
PDF creation	Yes (Markdown/HTML)	Yes
Markdown output	Yes	No
HTML output	Yes	No
Encryption	Read + Write	Read + Write
Rendering	Yes	Yes
OCR	Built-in (PaddleOCR)	Tesseract
Install size	~5 MB	~20 MB
Python versions	3.8–3.14	3.8–3.12

Side-by-Side Code

Text Extraction

PDF Oxide:

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
text = doc.extract_text(0)
print(text)

PyMuPDF:

import fitz

doc = fitz.open("report.pdf")
page = doc[0]
text = page.get_text()
print(text)

Markdown Conversion

PDF Oxide (built-in):

from pdf_oxide import PdfDocument

doc = PdfDocument("paper.pdf")
md = doc.to_markdown(0, detect_headings=True)
print(md)

PyMuPDF:

# PyMuPDF has no built-in Markdown conversion.
# Use pymupdf4llm (separate package, 69× slower than PDF Oxide):
import pymupdf4llm

md = pymupdf4llm.to_markdown("paper.pdf")

Image Extraction

PDF Oxide:

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
images = doc.extract_image_bytes(0)
for i, img in enumerate(images):
    with open(f"image_{i}.{img['format']}", "wb") as f:
        f.write(img["data"])

PyMuPDF:

import fitz

doc = fitz.open("report.pdf")
page = doc[0]
for i, img in enumerate(page.get_images()):
    xref = img[0]
    base_image = doc.extract_image(xref)
    with open(f"image_{i}.{base_image['ext']}", "wb") as f:
        f.write(base_image["image"])

PDF Creation from Markdown

PDF Oxide:

from pdf_oxide import Pdf

pdf = Pdf.from_markdown("# Invoice\n\n| Item | Price |\n|------|-------|\n| Widget | $9.99 |")
pdf.save("invoice.pdf")

PyMuPDF:

import fitz

# PyMuPDF cannot create PDFs from Markdown.
# You must manually place text on pages:
doc = fitz.open()
page = doc.new_page()
page.insert_text(fitz.Point(72, 72), "Invoice", fontsize=24)
doc.save("invoice.pdf")

Benchmark Details

Benchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs).

Metric	PDF Oxide	PyMuPDF
Mean extraction time	0.8ms	4.6ms
p99 extraction time	9ms	28ms
Pass rate (valid PDFs)	100% (3,823/3,823)	99.3% (3,796/3,823)
Text quality parity	99.5%	Baseline

See full benchmark methodology for corpus details and reproduction steps.

AGPL Licensing: What It Means for You

PyMuPDF wraps MuPDF, which is AGPL-3.0 licensed. This affects you if:

You distribute software that uses PyMuPDF (binaries, Docker images, Electron apps)
You run a SaaS where PyMuPDF processes user PDFs on your servers
You embed PyMuPDF in a product — even as a microservice behind an API

In all these cases, AGPL requires you to release your entire application’s source code under AGPL-3.0 — or purchase a commercial license from Artifex.

PDF Oxide is MIT licensed. Use it in any project — commercial, proprietary, SaaS, or open source — with no obligations.

Use Case	PDF Oxide (MIT)	PyMuPDF (AGPL)
Commercial product	Yes	Requires license
Closed-source SaaS	Yes	Requires license
Internal tools	Yes	Yes
Open-source project	Yes	Yes (if AGPL-compatible)
Docker distribution	Yes	Requires license

PyMuPDF Commercial License Pricing

Artifex (the company behind MuPDF and PyMuPDF) does not publish commercial license pricing publicly. Based on industry reports:

Contact required — you must request a quote from Artifex sales
Per-application licensing — pricing varies by deployment type and scale
Annual fees — commercial licenses are typically renewed yearly
No free tier — there is no “community” or “startup” exception to AGPL

For teams evaluating PyMuPDF for commercial use, the licensing cost is an ongoing operational expense on top of development time.

PDF Oxide is MIT licensed — free for all uses, forever. No sales calls, no license audits, no compliance risk. Use it in SaaS, distribute in Docker containers, embed in commercial products — no restrictions.

Migration Guide

API Mapping

Task	PyMuPDF	PDF Oxide
Open PDF	`fitz.open("f.pdf")`	`PdfDocument("f.pdf")`
Page count	`doc.page_count`	`doc.page_count()`
Extract text	`doc[0].get_text()`	`doc.extract_text(0)`
Character data	`doc[0].get_text("dict")`	`doc.extract_chars(0)`
Extract images	`doc[0].get_images()` + `doc.extract_image(xref)`	`doc.extract_images(0)`
Search text	`doc[0].search_for("query")`	`doc.search_page(0, "query")`
Encrypted PDF	`doc.authenticate("pw")`	`PdfDocument("f.pdf", password="pw")`
To Markdown	pymupdf4llm (separate)	`doc.to_markdown(0)`
Create from text	Manual `insert_text()`	`Pdf.from_markdown("# Title")`

Step-by-Step

Install: pip install pdf_oxide
Replace imports: import fitz → from pdf_oxide import PdfDocument
Replace open: fitz.open(path) → PdfDocument(path)
Replace extraction: page.get_text() → doc.extract_text(page_index)
Replace images: Multi-step xref lookup → doc.extract_images(page_index)
Update password handling: Use PdfDocument(path, password="pw") or doc.authenticate("pw") after opening
Test: Run your pipeline on your existing test files

When to Stay with PyMuPDF

You already have a commercial MuPDF license and depend on MuPDF-specific rendering
You need SVG export (PDF Oxide does not support SVG output)
Your project is already AGPL-licensed

Performance Benchmarks — full corpus results
vs Python PDF Libraries — all Python libraries compared
Getting Started with Python — installation and first extraction

PDF Oxide vs PyMuPDF

Why Developers Switch from PyMuPDF

Quick Comparison

Side-by-Side Code

Text Extraction

Markdown Conversion

Image Extraction

PDF Creation from Markdown

Benchmark Details

AGPL Licensing: What It Means for You

PyMuPDF Commercial License Pricing

Migration Guide

API Mapping

Step-by-Step

When to Stay with PyMuPDF

Related Pages