PDF Oxide vs PyMuPDF
PDF Oxide is a faster, MIT-licensed alternative to PyMuPDF. If you’re evaluating PyMuPDF for a commercial project or looking to replace it due to AGPL licensing, this page covers the key differences.
Why Developers Switch from PyMuPDF
Licensing. PyMuPDF uses MuPDF under AGPL-3.0. If you distribute software that includes PyMuPDF — including SaaS, web apps, and Docker containers — your code must be open-sourced under AGPL or you must buy a commercial license from Artifex. PDF Oxide is MIT licensed with no restrictions.
Speed. PDF Oxide extracts text at 0.8ms mean vs PyMuPDF’s 4.6ms — 5.8× faster on 3,830 PDFs.
Reliability. PDF Oxide achieves 100% pass rate on the same corpus where PyMuPDF passes 99.3% (27 failures on valid PDFs).
Quick Comparison
| PDF Oxide | PyMuPDF | |
|---|---|---|
| License | MIT | AGPL-3.0 |
| Mean extraction time | 0.8ms | 4.6ms |
| Pass rate (3,830 PDFs) | 100% | 99.3% |
| Text extraction | Yes | Yes |
| Character positions | Yes | Yes |
| Image extraction | Yes | Yes |
| Form fields | Read + Write | Read + Write |
| PDF creation | Yes (Markdown/HTML) | Yes |
| Markdown output | Yes | No |
| HTML output | Yes | No |
| Encryption | Read + Write | Read + Write |
| Rendering | Yes | Yes |
| OCR | Built-in (PaddleOCR) | Tesseract |
| Install size | ~5 MB | ~20 MB |
| Python versions | 3.8–3.14 | 3.8–3.12 |
Side-by-Side Code
Text Extraction
PDF Oxide:
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
text = doc.extract_text(0)
print(text)
PyMuPDF:
import fitz
doc = fitz.open("report.pdf")
page = doc[0]
text = page.get_text()
print(text)
Markdown Conversion
PDF Oxide (built-in):
from pdf_oxide import PdfDocument
doc = PdfDocument("paper.pdf")
md = doc.to_markdown(0, detect_headings=True)
print(md)
PyMuPDF:
# PyMuPDF has no built-in Markdown conversion.
# Use pymupdf4llm (separate package, 69× slower than PDF Oxide):
import pymupdf4llm
md = pymupdf4llm.to_markdown("paper.pdf")
Image Extraction
PDF Oxide:
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
images = doc.extract_image_bytes(0)
for i, img in enumerate(images):
with open(f"image_{i}.{img['format']}", "wb") as f:
f.write(img["data"])
PyMuPDF:
import fitz
doc = fitz.open("report.pdf")
page = doc[0]
for i, img in enumerate(page.get_images()):
xref = img[0]
base_image = doc.extract_image(xref)
with open(f"image_{i}.{base_image['ext']}", "wb") as f:
f.write(base_image["image"])
PDF Creation from Markdown
PDF Oxide:
from pdf_oxide import Pdf
pdf = Pdf.from_markdown("# Invoice\n\n| Item | Price |\n|------|-------|\n| Widget | $9.99 |")
pdf.save("invoice.pdf")
PyMuPDF:
import fitz
# PyMuPDF cannot create PDFs from Markdown.
# You must manually place text on pages:
doc = fitz.open()
page = doc.new_page()
page.insert_text(fitz.Point(72, 72), "Invoice", fontsize=24)
doc.save("invoice.pdf")
Benchmark Details
Benchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs).
| Metric | PDF Oxide | PyMuPDF |
|---|---|---|
| Mean extraction time | 0.8ms | 4.6ms |
| p99 extraction time | 9ms | 28ms |
| Pass rate (valid PDFs) | 100% (3,823/3,823) | 99.3% (3,796/3,823) |
| Text quality parity | 99.5% | Baseline |
See full benchmark methodology for corpus details and reproduction steps.
AGPL Licensing: What It Means for You
PyMuPDF wraps MuPDF, which is AGPL-3.0 licensed. This affects you if:
- You distribute software that uses PyMuPDF (binaries, Docker images, Electron apps)
- You run a SaaS where PyMuPDF processes user PDFs on your servers
- You embed PyMuPDF in a product — even as a microservice behind an API
In all these cases, AGPL requires you to release your entire application’s source code under AGPL-3.0 — or purchase a commercial license from Artifex.
PDF Oxide is MIT licensed. Use it in any project — commercial, proprietary, SaaS, or open source — with no obligations.
| Use Case | PDF Oxide (MIT) | PyMuPDF (AGPL) |
|---|---|---|
| Commercial product | Yes | Requires license |
| Closed-source SaaS | Yes | Requires license |
| Internal tools | Yes | Yes |
| Open-source project | Yes | Yes (if AGPL-compatible) |
| Docker distribution | Yes | Requires license |
PyMuPDF Commercial License Pricing
Artifex (the company behind MuPDF and PyMuPDF) does not publish commercial license pricing publicly. Based on industry reports:
- Contact required — you must request a quote from Artifex sales
- Per-application licensing — pricing varies by deployment type and scale
- Annual fees — commercial licenses are typically renewed yearly
- No free tier — there is no “community” or “startup” exception to AGPL
For teams evaluating PyMuPDF for commercial use, the licensing cost is an ongoing operational expense on top of development time.
PDF Oxide is MIT licensed — free for all uses, forever. No sales calls, no license audits, no compliance risk. Use it in SaaS, distribute in Docker containers, embed in commercial products — no restrictions.
Migration Guide
API Mapping
| Task | PyMuPDF | PDF Oxide |
|---|---|---|
| Open PDF | fitz.open("f.pdf") |
PdfDocument("f.pdf") |
| Page count | doc.page_count |
doc.page_count() |
| Extract text | doc[0].get_text() |
doc.extract_text(0) |
| Character data | doc[0].get_text("dict") |
doc.extract_chars(0) |
| Extract images | doc[0].get_images() + doc.extract_image(xref) |
doc.extract_images(0) |
| Search text | doc[0].search_for("query") |
doc.search_page(0, "query") |
| Encrypted PDF | doc.authenticate("pw") |
PdfDocument("f.pdf", password="pw") |
| To Markdown | pymupdf4llm (separate) | doc.to_markdown(0) |
| Create from text | Manual insert_text() |
Pdf.from_markdown("# Title") |
Step-by-Step
- Install:
pip install pdf_oxide - Replace imports:
import fitz→from pdf_oxide import PdfDocument - Replace open:
fitz.open(path)→PdfDocument(path) - Replace extraction:
page.get_text()→doc.extract_text(page_index) - Replace images: Multi-step xref lookup →
doc.extract_images(page_index) - Update password handling: Use
PdfDocument(path, password="pw")ordoc.authenticate("pw")after opening - Test: Run your pipeline on your existing test files
When to Stay with PyMuPDF
- You already have a commercial MuPDF license and depend on MuPDF-specific rendering
- You need SVG export (PDF Oxide does not support SVG output)
- Your project is already AGPL-licensed
Related Pages
- Performance Benchmarks — full corpus results
- vs Python PDF Libraries — all Python libraries compared
- Getting Started with Python — installation and first extraction