Skip to content

Migrate from PyMuPDF (fitz) to PDF Oxide

A complete guide to switching from PyMuPDF to PDF Oxide, covering every API you use today and how to replace it.

Why Switch from PyMuPDF?

There are four compelling reasons to migrate:

  1. 5.8x faster — PDF Oxide averages 0.8ms per page vs PyMuPDF’s 4.6ms. At scale, that difference compounds: a 1,000-page batch finishes in under a second instead of five.
  2. MIT license — PyMuPDF uses AGPL, which requires you to open-source any code that interacts with it, or purchase a commercial license. PDF Oxide is MIT — use it anywhere, no strings attached.
  3. 100% reliability — PDF Oxide passes 100% of the PDF test suite. PyMuPDF fails on 0.7% of files (99.3% pass rate), which means broken output on roughly 1 in 140 documents.
  4. Built-in features — Markdown conversion, HTML output, OCR, XFA form support, and PDF rendering are all included. PyMuPDF requires separate packages (pymupdf4llm) or external tools (Tesseract) for similar functionality.

Step 1: Install

pip install pdf_oxide
pip uninstall pymupdf  # optional — remove when ready

Step 2: Replace Imports

# Before
import fitz

# After
from pdf_oxide import PdfDocument

If you used pymupdf4llm for Markdown conversion, you can remove that dependency entirely — PDF Oxide has it built in.

Step 3: API Mapping

Task PyMuPDF PDF Oxide
Open PDF fitz.open("file.pdf") PdfDocument("file.pdf")
Page count doc.page_count doc.page_count()
Extract text doc[0].get_text() doc.extract_text(0)
Character positions doc[0].get_text("dict") doc.extract_chars(0)
Extract images doc[0].get_images() + doc.extract_image(xref) doc.extract_images(0)
Search text doc[0].search_for("query") doc.search_page(0, "query")
Form fields doc[0].widgets() or doc.get_form_fields() doc.get_form_fields()
Encrypted PDF doc.authenticate("pw") PdfDocument("f.pdf", password="pw")
To Markdown pymupdf4llm.to_markdown("file.pdf") (separate package) doc.to_markdown(0) (built-in)
To HTML Not available doc.to_html(0)
Create PDF Manual insert_text() Pdf.from_markdown("# Title")
Render to image doc[0].get_pixmap() doc.render_page(0)
XFA forms Not supported doc.has_xfa()
OCR Requires Tesseract Built-in PaddleOCR

Step 4: Common Pattern Changes

Text Extraction Loop

# PyMuPDF
import fitz
doc = fitz.open("report.pdf")
for page in doc:
    text = page.get_text()
    print(text)

# PDF Oxide
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
for i in range(doc.page_count()):
    text = doc.extract_text(i)
    print(text)

Image Extraction

PyMuPDF requires a multi-step xref lookup. PDF Oxide does it in one call:

# PyMuPDF — multi-step xref lookup
import fitz
doc = fitz.open("report.pdf")
page = doc[0]
for img in page.get_images():
    xref = img[0]
    base = doc.extract_image(xref)
    with open(f"img.{base['ext']}", "wb") as f:
        f.write(base["image"])

# PDF Oxide — one step
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
for i, img in enumerate(doc.extract_image_bytes(0)):
    with open(f"img_{i}.{img['format']}", "wb") as f:
        f.write(img["data"])

Encrypted PDFs

PyMuPDF uses a two-step open-then-authenticate pattern. PDF Oxide supports both password= in the constructor and doc.authenticate() after opening:

# PyMuPDF
import fitz
doc = fitz.open("encrypted.pdf")
doc.authenticate("password")
text = doc[0].get_text()

# PDF Oxide — one step with password=
from pdf_oxide import PdfDocument
doc = PdfDocument("encrypted.pdf", password="password")
text = doc.extract_text(0)

Markdown Conversion

PyMuPDF requires the separate pymupdf4llm package. PDF Oxide has Markdown built in:

# PyMuPDF — requires extra package
import pymupdf4llm
md = pymupdf4llm.to_markdown("report.pdf")

# PDF Oxide — built-in
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
md = doc.to_markdown(0)

Page Rendering

# PyMuPDF
import fitz
doc = fitz.open("report.pdf")
pix = doc[0].get_pixmap()
pix.save("page.png")

# PDF Oxide
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
png_bytes = doc.render_page(0, dpi=150)
with open("page.png", "wb") as f:
    f.write(png_bytes)

Step 5: Testing Your Migration

Run your existing test files through both libraries and compare output:

from pdf_oxide import PdfDocument

doc = PdfDocument("your-test-file.pdf")

# Verify text extraction
text = doc.extract_text(0)
print(text[:500])

# Verify page count
print(f"Pages: {doc.page_count()}")

# Verify form fields (if applicable)
fields = doc.get_form_fields()
for f in fields:
    print(f"{f.name}: {f.value}")

Other Migration Guides