Formulários XFA — Python, Rust, Node.js, Go, C#
Detecte e analise formulários XFA — a única biblioteca que lê os templates XML nativamente em vez de depender do fallback AcroForm:
from pdf_oxide import PdfDocument
doc = PdfDocument("government-form.pdf")
xfa = doc.analyze_xfa()
if xfa:
print(f"XFA form with {len(xfa.fields)} fields")
for field in xfa.fields:
print(f" {field.name}: {field.field_type}")
XFA (XML Forms Architecture) é um formato legado de formulários usado por muitas agências governamentais, instituições financeiras e sistemas empresariais. A maioria das bibliotecas PDF para Python não consegue lidar com formulários XFA. O PDF Oxide pode detect, analyze, and extract data from them.
O Que É XFA?
XFA forms use XML-based templates embedded inside a PDF instead of standard AcroCampos de formulário. They were created by Adobe and are common in:
- Government forms — IRS, immigration, state agency documents
- Financial forms — loan applications, insurance claims
- Enterprise forms — HR onboarding, procurement, compliance
XFA was deprecated in PDF 2.0 (ISO 32000-2:2020), but millions of existing XFA documents remain in circulation.
XFA vs AcroForm
| Feature | AcroForm | XFA |
|---|---|---|
| Format | PDF objetos | XML templates |
| Supported by | All PDF libraries | Few PDF libraries |
| Dynamic layouts | Não | Sim |
| PDF 2.0 status | Supported | Deprecated |
| Typical source | Most form creators | Adobe LiveCycle, Adobe Designer |
Por Que PyMuPDF e pypdf Não Conseguem Lidar com Formulários XFA
If you have tried reading XFA forms with popular Python PDF libraries, you have likely seen empty results with no error or warning. Isto é because PyMuPDF, pypdf, pdfplumber, and pdfminer have no XFA support.
PyMuPDF (fitz) — silently returns empty
PyMuPDF’s doc.get_form_fields() and página .widgets() only read AcroCampos de formulário. When a PDF uses XFA-only forms (common with IRS, immigration, and state agency documents), PyMuPDF returns empty results without any warning:
# PyMuPDF — silently misses XFA data
import fitz
doc = fitz.open("government-form.pdf")
fields = doc[0].widgets() # Returns [] on XFA-only forms
form_data = doc.get_form_fields() # Returns {} on XFA-only forms
If the XFA form includes an AcroForm fallback layer, PyMuPDF may return a partial subset of fields — but the actual XFA data (dynamic layouts, calculated values, nested subforms) is invisible.
pypdf — also returns empty on XFA forms
pypdf’s form field reading hits the same limitation. It can only access AcroCampos de formulário and has no XFA support:
# pypdf — cannot read XFA content
from pypdf import PdfReader
reader = PdfReader("government-form.pdf")
fields = reader.get_form_text_fields() # Returns {} on XFA-only forms
pdfplumber and pdfminer — no XFA support at all
pdfplumber and pdfminer do not attempt to read form fields from XFA forms. They have no API for XFA detection or extraction.
PDF Oxide — reads XFA natively
PDF Oxide parses the XFA XML templates directly, extracting all fields, values, and form structure:
# PDF Oxide — reads XFA natively
from pdf_oxide import PdfDocument
doc = PdfDocument("government-form.pdf")
xfa = doc.analyze_xfa()
print(f"{len(xfa.fields)} fields found") # All XFA fields extracted
This works on government forms, IRS documents, insurance applications, and any other XFA-based PDF — including forms with no AcroForm fallback layer.
Instalação
pip install pdf_oxide
Detectando Formulários XFA
Check if a PDF contains XFA content:
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("form.pdf")
xfa = doc.analyze_xfa()
if xfa:
print("This PDF uses XFA forms")
print(f" Fields: {len(xfa.fields)}")
print(f" Has template: {xfa.has_template}")
print(f" Has datasets: {xfa.has_datasets}")
else:
print("Standard AcroForm (or no forms)")
WASM
In WASM, you can detect XFA forms and fall back to reading AcroCampos de formulário:
import { WasmPdfDocument } from "pdf-oxide-wasm";
const doc = new WasmPdfDocument(bytes);
if (doc.hasXfa()) {
console.log("This PDF uses XFA forms");
// Read any AcroForm fallback fields
const fields = doc.getFormFields();
console.log(`AcroForm fallback fields: ${fields.length}`);
}
doc.free();
Analisando Campos XFA
Get details about every field in the XFA form:
from pdf_oxide import PdfDocument
doc = PdfDocument("tax-form.pdf")
xfa = doc.analyze_xfa()
if xfa:
for field in xfa.fields:
print(f"Name: {field.name}")
print(f" Type: {field.field_type}")
print(f" Value: {field.value}")
print()
Lendo Dados XFA
Extract current field values from the XFA datasets:
from pdf_oxide import PdfDocument
doc = PdfDocument("filled-xfa.pdf")
xfa = doc.analyze_xfa()
if xfa and xfa.has_datasets:
data = {}
for field in xfa.fields:
if field.value:
data[field.name] = field.value
print(data)
Processamento em Lote XFA Forms
Scan a directory to identify which PDFs use XFA:
from pdf_oxide import PdfDocument, PdfError
from pathlib import Path
pdf_dir = Path("government-forms/")
xfa_files = []
acroform_files = []
for pdf_path in pdf_dir.glob("*.pdf"):
try:
doc = PdfDocument(str(pdf_path))
xfa = doc.analyze_xfa()
if xfa:
xfa_files.append(pdf_path.name)
else:
acroform_files.append(pdf_path.name)
except PdfError as e:
print(f"Error: {pdf_path.name}: {e}")
print(f"XFA forms: {len(xfa_files)}")
print(f"Standard forms: {len(acroform_files)}")
API Rust
use pdf_oxide::PdfDocument;
use pdf_oxide::xfa::analyze_xfa_document;
let mut doc = PdfDocument::open("xfa-form.pdf")?;
let analysis = analyze_xfa_document(&mut doc)?;
println!("XFA form detected: {} fields", analysis.fields.len());
for field in &analysis.fields {
println!(" {} ({:?}): {:?}", field.name, field.field_type, field.value);
}
Por Que XFA É Importante
Most Python PDF libraries silently ignore XFA content — extract_text() and form field APIs only see the AcroForm fallback layer (if one exists). Many XFA-only forms have no AcroForm fallback, making them invisible to other tools:
- PyMuPDF (pymupdf) XFA forms —
get_form_fields()and.widgets()return empty on XFA-only PDFs. PyMuPDF has no XFA support and no plans to add it. - pypdf XFA support — pypdf’s
get_form_text_fields()cannot read XFA content. Only AcroForm fallback fields are visible, if they exist at all. - pdfplumber — no XFA support. Form extraction is limited to AcroCampos de formulário.
- pdfminer — no XFA support. Cannot detect or extract XFA form data.
O PDF Oxide e the only Python PDF library that reads XFA XML templates directly, giving you access to form structure and data that PyMuPDF, pypdf, pdfplumber, and pdfminer cannot see.
Páginas Relacionadas
- Form Data Extraction — AcroForm extraction API
- Fill PDF Forms — form filling guide
- Form Field Editing — advanced form operations