Migrate from PyMuPDF (fitz) to PDF Oxide
A complete guide to switching from PyMuPDF to PDF Oxide, covering every API you use today and how to replace it.
Why Switch from PyMuPDF?
There are four compelling reasons to migrate:
- 5.8x faster — PDF Oxide averages 0.8ms per page vs PyMuPDF’s 4.6ms. At scale, that difference compounds: a 1,000-page batch finishes in under a second instead of five.
- MIT license — PyMuPDF uses AGPL, which requires you to open-source any code that interacts with it, or purchase a commercial license. PDF Oxide is MIT — use it anywhere, no strings attached.
- 100% reliability — PDF Oxide passes 100% of the PDF test suite. PyMuPDF fails on 0.7% of files (99.3% pass rate), which means broken output on roughly 1 in 140 documents.
- Built-in features — Markdown conversion, HTML output, OCR, XFA form support, and PDF rendering are all included. PyMuPDF requires separate packages (
pymupdf4llm) or external tools (Tesseract) for similar functionality.
Step 1: Install
pip install pdf_oxide
pip uninstall pymupdf # optional — remove when ready
Step 2: Replace Imports
# Before
import fitz
# After
from pdf_oxide import PdfDocument
If you used pymupdf4llm for Markdown conversion, you can remove that dependency entirely — PDF Oxide has it built in.
Step 3: API Mapping
| Task | PyMuPDF | PDF Oxide |
|---|---|---|
| Open PDF | fitz.open("file.pdf") |
PdfDocument("file.pdf") |
| Page count | doc.page_count |
doc.page_count() |
| Extract text | doc[0].get_text() |
doc.extract_text(0) |
| Character positions | doc[0].get_text("dict") |
doc.extract_chars(0) |
| Extract images | doc[0].get_images() + doc.extract_image(xref) |
doc.extract_images(0) |
| Search text | doc[0].search_for("query") |
doc.search_page(0, "query") |
| Form fields | doc[0].widgets() or doc.get_form_fields() |
doc.get_form_fields() |
| Encrypted PDF | doc.authenticate("pw") |
PdfDocument("f.pdf", password="pw") |
| To Markdown | pymupdf4llm.to_markdown("file.pdf") (separate package) |
doc.to_markdown(0) (built-in) |
| To HTML | Not available | doc.to_html(0) |
| Create PDF | Manual insert_text() |
Pdf.from_markdown("# Title") |
| Render to image | doc[0].get_pixmap() |
doc.render_page(0) |
| XFA forms | Not supported | doc.has_xfa() |
| OCR | Requires Tesseract | Built-in PaddleOCR |
Step 4: Common Pattern Changes
Text Extraction Loop
# PyMuPDF
import fitz
doc = fitz.open("report.pdf")
for page in doc:
text = page.get_text()
print(text)
# PDF Oxide
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
for i in range(doc.page_count()):
text = doc.extract_text(i)
print(text)
Image Extraction
PyMuPDF requires a multi-step xref lookup. PDF Oxide does it in one call:
# PyMuPDF — multi-step xref lookup
import fitz
doc = fitz.open("report.pdf")
page = doc[0]
for img in page.get_images():
xref = img[0]
base = doc.extract_image(xref)
with open(f"img.{base['ext']}", "wb") as f:
f.write(base["image"])
# PDF Oxide — one step
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
for i, img in enumerate(doc.extract_image_bytes(0)):
with open(f"img_{i}.{img['format']}", "wb") as f:
f.write(img["data"])
Encrypted PDFs
PyMuPDF uses a two-step open-then-authenticate pattern. PDF Oxide supports both password= in the constructor and doc.authenticate() after opening:
# PyMuPDF
import fitz
doc = fitz.open("encrypted.pdf")
doc.authenticate("password")
text = doc[0].get_text()
# PDF Oxide — one step with password=
from pdf_oxide import PdfDocument
doc = PdfDocument("encrypted.pdf", password="password")
text = doc.extract_text(0)
Markdown Conversion
PyMuPDF requires the separate pymupdf4llm package. PDF Oxide has Markdown built in:
# PyMuPDF — requires extra package
import pymupdf4llm
md = pymupdf4llm.to_markdown("report.pdf")
# PDF Oxide — built-in
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
md = doc.to_markdown(0)
Page Rendering
# PyMuPDF
import fitz
doc = fitz.open("report.pdf")
pix = doc[0].get_pixmap()
pix.save("page.png")
# PDF Oxide
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
png_bytes = doc.render_page(0, dpi=150)
with open("page.png", "wb") as f:
f.write(png_bytes)
Step 5: Testing Your Migration
Run your existing test files through both libraries and compare output:
from pdf_oxide import PdfDocument
doc = PdfDocument("your-test-file.pdf")
# Verify text extraction
text = doc.extract_text(0)
print(text[:500])
# Verify page count
print(f"Pages: {doc.page_count()}")
# Verify form fields (if applicable)
fields = doc.get_form_fields()
for f in fields:
print(f"{f.name}: {f.value}")
Other Migration Guides
Related Pages
- PDF Oxide vs PyMuPDF — detailed comparison
- Getting Started with Python — installation guide
- Extract Text from PDF — text extraction guide