Skip to content

PDF Oxide vs PyMuPDF

PDF Oxide is a faster, MIT-licensed alternative to PyMuPDF. If you’re evaluating PyMuPDF for a commercial project or looking to replace it due to AGPL licensing, this page covers the key differences.

Why Developers Switch from PyMuPDF

Licensing. PyMuPDF uses MuPDF under AGPL-3.0. If you distribute software that includes PyMuPDF — including SaaS, web apps, and Docker containers — your code must be open-sourced under AGPL or you must buy a commercial license from Artifex. PDF Oxide is MIT licensed with no restrictions.

Speed. PDF Oxide extracts text at 0.8ms mean vs PyMuPDF’s 4.6ms — 5.8× faster on 3,830 PDFs.

Reliability. PDF Oxide achieves 100% pass rate on the same corpus where PyMuPDF passes 99.3% (27 failures on valid PDFs).

Quick Comparison

PDF Oxide PyMuPDF
License MIT AGPL-3.0
Mean extraction time 0.8ms 4.6ms
Pass rate (3,830 PDFs) 100% 99.3%
Text extraction Yes Yes
Character positions Yes Yes
Image extraction Yes Yes
Form fields Read + Write Read + Write
PDF creation Yes (Markdown/HTML) Yes
Markdown output Yes No
HTML output Yes No
Encryption Read + Write Read + Write
Rendering Yes Yes
OCR Built-in (PaddleOCR) Tesseract
Install size ~5 MB ~20 MB
Python versions 3.8–3.14 3.8–3.12

Side-by-Side Code

Text Extraction

PDF Oxide:

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
text = doc.extract_text(0)
print(text)

PyMuPDF:

import fitz

doc = fitz.open("report.pdf")
page = doc[0]
text = page.get_text()
print(text)

Markdown Conversion

PDF Oxide (built-in):

from pdf_oxide import PdfDocument

doc = PdfDocument("paper.pdf")
md = doc.to_markdown(0, detect_headings=True)
print(md)

PyMuPDF:

# PyMuPDF has no built-in Markdown conversion.
# Use pymupdf4llm (separate package, 69× slower than PDF Oxide):
import pymupdf4llm

md = pymupdf4llm.to_markdown("paper.pdf")

Image Extraction

PDF Oxide:

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
images = doc.extract_image_bytes(0)
for i, img in enumerate(images):
    with open(f"image_{i}.{img['format']}", "wb") as f:
        f.write(img["data"])

PyMuPDF:

import fitz

doc = fitz.open("report.pdf")
page = doc[0]
for i, img in enumerate(page.get_images()):
    xref = img[0]
    base_image = doc.extract_image(xref)
    with open(f"image_{i}.{base_image['ext']}", "wb") as f:
        f.write(base_image["image"])

PDF Creation from Markdown

PDF Oxide:

from pdf_oxide import Pdf

pdf = Pdf.from_markdown("# Invoice\n\n| Item | Price |\n|------|-------|\n| Widget | $9.99 |")
pdf.save("invoice.pdf")

PyMuPDF:

import fitz

# PyMuPDF cannot create PDFs from Markdown.
# You must manually place text on pages:
doc = fitz.open()
page = doc.new_page()
page.insert_text(fitz.Point(72, 72), "Invoice", fontsize=24)
doc.save("invoice.pdf")

Benchmark Details

Benchmarked on 3,830 PDFs from three independent public test suites (veraPDF, Mozilla pdf.js, DARPA SafeDocs).

Metric PDF Oxide PyMuPDF
Mean extraction time 0.8ms 4.6ms
p99 extraction time 9ms 28ms
Pass rate (valid PDFs) 100% (3,823/3,823) 99.3% (3,796/3,823)
Text quality parity 99.5% Baseline

See full benchmark methodology for corpus details and reproduction steps.

AGPL Licensing: What It Means for You

PyMuPDF wraps MuPDF, which is AGPL-3.0 licensed. This affects you if:

  • You distribute software that uses PyMuPDF (binaries, Docker images, Electron apps)
  • You run a SaaS where PyMuPDF processes user PDFs on your servers
  • You embed PyMuPDF in a product — even as a microservice behind an API

In all these cases, AGPL requires you to release your entire application’s source code under AGPL-3.0 — or purchase a commercial license from Artifex.

PDF Oxide is MIT licensed. Use it in any project — commercial, proprietary, SaaS, or open source — with no obligations.

Use Case PDF Oxide (MIT) PyMuPDF (AGPL)
Commercial product Yes Requires license
Closed-source SaaS Yes Requires license
Internal tools Yes Yes
Open-source project Yes Yes (if AGPL-compatible)
Docker distribution Yes Requires license

PyMuPDF Commercial License Pricing

Artifex (the company behind MuPDF and PyMuPDF) does not publish commercial license pricing publicly. Based on industry reports:

  • Contact required — you must request a quote from Artifex sales
  • Per-application licensing — pricing varies by deployment type and scale
  • Annual fees — commercial licenses are typically renewed yearly
  • No free tier — there is no “community” or “startup” exception to AGPL

For teams evaluating PyMuPDF for commercial use, the licensing cost is an ongoing operational expense on top of development time.

PDF Oxide is MIT licensed — free for all uses, forever. No sales calls, no license audits, no compliance risk. Use it in SaaS, distribute in Docker containers, embed in commercial products — no restrictions.

Migration Guide

API Mapping

Task PyMuPDF PDF Oxide
Open PDF fitz.open("f.pdf") PdfDocument("f.pdf")
Page count doc.page_count doc.page_count()
Extract text doc[0].get_text() doc.extract_text(0)
Character data doc[0].get_text("dict") doc.extract_chars(0)
Extract images doc[0].get_images() + doc.extract_image(xref) doc.extract_images(0)
Search text doc[0].search_for("query") doc.search_page(0, "query")
Encrypted PDF doc.authenticate("pw") PdfDocument("f.pdf", password="pw")
To Markdown pymupdf4llm (separate) doc.to_markdown(0)
Create from text Manual insert_text() Pdf.from_markdown("# Title")

Step-by-Step

  1. Install: pip install pdf_oxide
  2. Replace imports: import fitzfrom pdf_oxide import PdfDocument
  3. Replace open: fitz.open(path)PdfDocument(path)
  4. Replace extraction: page.get_text()doc.extract_text(page_index)
  5. Replace images: Multi-step xref lookup → doc.extract_images(page_index)
  6. Update password handling: Use PdfDocument(path, password="pw") or doc.authenticate("pw") after opening
  7. Test: Run your pipeline on your existing test files

When to Stay with PyMuPDF

  • You already have a commercial MuPDF license and depend on MuPDF-specific rendering
  • You need SVG export (PDF Oxide does not support SVG output)
  • Your project is already AGPL-licensed