What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Rust API 参考

本页记录了 pdf_oxide 中的每个公共结构和方法。有关 Python 绑定，请参阅 Python API 参考. 关于类型和枚举的详细信息，请参见类型与枚举.

PdfDocument

底层文档句柄。打开 PDF 文件，提取文本、图片和元数据。

use pdf_oxide::PdfDocument;

打开与认证

方法	签名	描述
`open`	`fn open(path: impl AsRef<Path>) -> Result<Self>`	从磁盘打开 PDF 文件
`open_from_bytes`	`fn open_from_bytes(data: Vec<u8>) -> Result<Self>`	从内存字节打开 PDF
`open_with_config`	`fn open_with_config(path: impl AsRef<Path>, config: impl Any) -> Result<Self>`	使用解析器配置打开
`authenticate`	`fn authenticate(&mut self, password: &[u8]) -> Result<bool>`	使用用户或所有者密码认证

元数据

方法	签名	描述
`page_count`	`fn page_count(&mut self) -> Result<usize>`	文档中的页数
`page_count_u32`	`fn page_count_u32(&mut self) -> u32`	页数 as u32 (0 on error)
`version`	`fn version(&self) -> (u8, u8)`	PDF version as `(major, minor)`
`trailer`	`fn trailer(&self) -> &Object`	原始尾部字典
`catalog`	`fn catalog(&mut self) -> Result<Object>`	文档目录字典

文本提取

方法	签名	描述
`extract_text`	`fn extract_text(&mut self, page_index: usize) -> Result<String>`	单个页面的纯文本
`extract_all_text`	`fn extract_all_text(&mut self) -> Result<String>`	所有页面的纯文本
`extract_spans`	`fn extract_spans(&mut self, page_index: usize) -> Result<Vec<TextSpan>>`	带字体元数据的文本段
`extract_spans_with_config`	`fn extract_spans_with_config(&mut self, page_index: usize, config: &TextConfig) -> Result<Vec<TextSpan>>`	使用自定义配置的文本段
`extract_chars`	`fn extract_chars(&mut self, page_index: usize) -> Result<Vec<TextChar>>`	逐字符位置和元数据
`extract_text_with_ocr`	`fn extract_text_with_ocr(&mut self, page_index: usize) -> Result<String>`	扫描页面使用 OCR 回退的文本
`extract_spans_with_ocr`	`fn extract_spans_with_ocr(&mut self, page_index: usize) -> Result<Vec<TextSpan>>`	使用 OCR 回退的文本段
`apply_intelligent_text_processing`	`fn apply_intelligent_text_processing(&self, spans: Vec<TextSpan>) -> Vec<TextSpan>`	连字扩展、连字符重构、OCR 清理
`extract_hierarchical_content`	`fn extract_hierarchical_content(&mut self, page_index: usize) -> Result<Vec<ContentElement>>`	标记 PDF 的结构化内容树

格式转换

方法	签名	描述
`to_markdown`	`fn to_markdown(&mut self, page_index: usize, options: &ConversionOptions) -> Result<String>`	将页面转换为 Markdown
`to_html`	`fn to_html(&mut self, page_index: usize, options: &ConversionOptions) -> Result<String>`	将页面转换为 HTML
`to_plain_text`	`fn to_plain_text(&mut self, page_index: usize) -> Result<String>`	将页面转换为纯文本
`to_markdown_all`	`fn to_markdown_all(&mut self, options: &ConversionOptions) -> Result<String>`	所有页面转 Markdown
`to_html_all`	`fn to_html_all(&mut self, options: &ConversionOptions) -> Result<String>`	所有页面转 HTML
`to_plain_text_all`	`fn to_plain_text_all(&mut self) -> Result<String>`	所有页面转纯文本
`to_markdown_with_ocr`	`fn to_markdown_with_ocr(&mut self, page_index: usize, options: &ConversionOptions) -> Result<String>`	使用 OCR 回退的 Markdown

图片提取

方法	签名	描述
`extract_images`	`fn extract_images(&mut self, page_index: usize) -> Result<Vec<ImageInfo>>`	页面中的图片元数据和原始数据
`extract_images_to_files`	`fn extract_images_to_files(&mut self, page_index: usize, output_dir: &str) -> Result<Vec<PathBuf>>`	提取图片并保存到磁盘

路径与图形提取

方法	签名	描述
`extract_paths`	`fn extract_paths(&mut self, page_index: usize) -> Result<Vec<PathContent>>`	页面中的矢量图形
`extract_paths_in_rect`	`fn extract_paths_in_rect(&mut self, page_index: usize, rect: Rect) -> Result<Vec<PathContent>>`	矩形区域内的路径

页面信息

方法	签名	描述
`get_page_info`	`fn get_page_info(&mut self, page_index: usize) -> Result<PageInfo>`	页面尺寸、旋转、区域
`get_page_resources`	`fn get_page_resources(&mut self, page_index: usize) -> Result<Object>`	原始资源字典
`get_page_content_data`	`fn get_page_content_data(&mut self, page_index: usize) -> Result<Vec<u8>>`	原始内容流字节

结构与无障碍

方法	签名	描述
`structure_tree`	`fn structure_tree(&mut self) -> Result<Option<StructTreeRoot>>`	带标签的结构树
`mark_info`	`fn mark_info(&mut self) -> Result<MarkInfo>`	MarkInfo 字典（标记、嫌疑）

底层

方法	签名	描述
`load_object`	`fn load_object(&mut self, obj_ref: ObjectRef) -> Result<Object>`	按引用加载 PDF 对象
`resolve_object`	`fn resolve_object(&mut self, obj: &Object) -> Result<Object>`	解析间接引用
`resolve_references`	`fn resolve_references(&mut self, obj: &Object, max_depth: usize) -> Result<Object>`	递归解析所有引用
`check_for_circular_references`	`fn check_for_circular_references(&mut self) -> Vec<(ObjectRef, ObjectRef)>`	检测循环引用链

Pdf

统一的高级 API。一个类型用于提取、创建、编辑、搜索和合规性。

use pdf_oxide::api::Pdf;

构造函数

方法	签名	描述
`new`	`fn new() -> Self`	创建空的 Pdf 实例
`open`	`fn open(path: impl AsRef<Path>) -> Result<Self>`	打开现有 PDF 进行读取
`open_editor`	`fn open_editor(path: impl AsRef<Path>) -> Result<DocumentEditor>`	打开进行结构编辑
`from_markdown`	`fn from_markdown(content: &str) -> Result<Self>`	从 Markdown 创建 PDF
`from_html`	`fn from_html(content: &str) -> Result<Self>`	从 HTML 创建 PDF
`from_text`	`fn from_text(content: &str) -> Result<Self>`	从纯文本创建 PDF
`from_image`	`fn from_image(path: impl AsRef<Path>) -> Result<Self>`	从图片文件创建 PDF
`from_image_bytes`	`fn from_image_bytes(data: &[u8]) -> Result<Self>`	从图片字节创建 PDF
`from_images`	`fn from_images<P: AsRef<Path>>(paths: &[P]) -> Result<Self>`	从多图片创建多页
`from_qrcode`	`fn from_qrcode(data: &str) -> Result<Self>`	包含二维码的 PDF
`from_qrcode_with_options`	`fn from_qrcode_with_options(data: &str, size: f32, ecl: &str) -> Result<Self>`	自定义大小和纠错的二维码
`from_barcode`	`fn from_barcode(data: &str, barcode_type: BarcodeType) -> Result<Self>`	包含条码的 PDF
`from_barcode_with_options`	`fn from_barcode_with_options(data: &str, barcode_type: BarcodeType, width: f32, height: f32) -> Result<Self>`	自定义尺寸的条码

提取

方法	签名	描述
`page_count`	`fn page_count(&mut self) -> Result<usize>`	页数
`page`	`fn page(&mut self, index: usize) -> Result<PdfPage>`	类 DOM 的页面句柄
`to_markdown`	`fn to_markdown(&mut self, page: usize) -> Result<String>`	页面转 Markdown
`to_html`	`fn to_html(&mut self, page: usize) -> Result<String>`	页面转 HTML
`to_text`	`fn to_text(&mut self, page: usize) -> Result<String>`	页面转纯文本

搜索

方法	签名	描述
`search`	`fn search(&mut self, pattern: &str) -> Result<Vec<SearchResult>>`	搜索所有页面
`search_with_options`	`fn search_with_options(&mut self, pattern: &str, opts: &SearchOptions) -> Result<Vec<SearchResult>>`	使用选项搜索
`search_page`	`fn search_page(&mut self, page: usize, pattern: &str) -> Result<Vec<SearchResult>>`	搜索单个页面
`highlight_matches`	`fn highlight_matches(&mut self, pattern: &str) -> Result<usize>`	添加高亮注释s for matches

元数据

方法	签名	描述
`set_title`	`fn set_title(&mut self, title: impl Into<String>) -> Result<()>`	设置文档标题
`set_author`	`fn set_author(&mut self, author: impl Into<String>) -> Result<()>`	设置作者
`set_subject`	`fn set_subject(&mut self, subject: impl Into<String>) -> Result<()>`	设置主题
`set_keywords`	`fn set_keywords(&mut self, keywords: impl Into<String>) -> Result<()>`	设置关键词

XMP 元数据

方法	签名	描述
`xmp_metadata`	`fn xmp_metadata(&mut self) -> Result<Option<XmpMetadata>>`	完整 XMP 元数据
`has_xmp_metadata`	`fn has_xmp_metadata(&mut self) -> Result<bool>`	检查 XMP 是否存在
`xmp_title`	`fn xmp_title(&mut self) -> Result<Option<String>>`	XMP dc:title
`xmp_creators`	`fn xmp_creators(&mut self) -> Result<Vec<String>>`	XMP dc:creator list
`xmp_description`	`fn xmp_description(&mut self) -> Result<Option<String>>`	XMP dc:description
`xmp_creator_tool`	`fn xmp_creator_tool(&mut self) -> Result<Option<String>>`	XMP xmp:CreatorTool
`xmp_create_date`	`fn xmp_create_date(&mut self) -> Result<Option<String>>`	XMP xmp:CreateDate
`xmp_modify_date`	`fn xmp_modify_date(&mut self) -> Result<Option<String>>`	XMP xmp:ModifyDate
`xmp_producer`	`fn xmp_producer(&mut self) -> Result<Option<String>>`	XMP pdf:Producer

页面标签

方法	签名	描述
`page_labels`	`fn page_labels(&mut self) -> Result<Vec<PageLabelRange>>`	页面标签范围
`page_label`	`fn page_label(&mut self, page: usize) -> Result<String>`	特定页面的标签
`all_page_labels`	`fn all_page_labels(&mut self) -> Result<Vec<String>>`	所有页面的标签

页面操作

方法	签名	描述
`page_rotation`	`fn page_rotation(&mut self, page: usize) -> Result<i32>`	当前旋转角度
`set_page_rotation`	`fn set_page_rotation(&mut self, page: usize, degrees: i32) -> Result<()>`	设置绝对旋转
`rotate_page`	`fn rotate_page(&mut self, page: usize, degrees: i32) -> Result<()>`	添加相对旋转
`rotate_all_pages`	`fn rotate_all_pages(&mut self, degrees: i32) -> Result<()>`	旋转所有页面
`page_media_box`	`fn page_media_box(&mut self, page: usize) -> Result<[f32; 4]>`	MediaBox 尺寸
`set_page_media_box`	`fn set_page_media_box(&mut self, page: usize, rect: [f32; 4]) -> Result<()>`	设置 MediaBox
`page_crop_box`	`fn page_crop_box(&mut self, page: usize) -> Result<Option<[f32; 4]>>`	CropBox 尺寸
`set_page_crop_box`	`fn set_page_crop_box(&mut self, page: usize, rect: [f32; 4]) -> Result<()>`	设置 CropBox
`crop_margins`	`fn crop_margins(&mut self, left: f32, right: f32, top: f32, bottom: f32) -> Result<()>`	按边距裁剪所有页面

内容编辑

方法	签名	描述
`save_page`	`fn save_page(&mut self, page: PdfPage) -> Result<()>`	保存修改后的页面
`erase_region`	`fn erase_region(&mut self, page: usize, rect: [f32; 4]) -> Result<()>`	涂白矩形区域
`erase_regions`	`fn erase_regions(&mut self, page: usize, rects: &[[f32; 4]]) -> Result<()>`	涂白多个区域
`clear_erase_regions`	`fn clear_erase_regions(&mut self, page: usize)`	清除待处理的擦除操作

注释

方法	签名	描述
`flatten_page_annotations`	`fn flatten_page_annotations(&mut self, page: usize) -> Result<()>`	扁平化页面上的注释
`flatten_all_annotations`	`fn flatten_all_annotations(&mut self) -> Result<()>`	展平所有注释
`is_page_marked_for_flatten`	`fn is_page_marked_for_flatten(&self, page: usize) -> bool`	检查扁平化状态
`unmark_page_for_flatten`	`fn unmark_page_for_flatten(&mut self, page: usize)`	取消标记页面

表单

方法	签名	描述
`flatten_forms_on_page`	`fn flatten_forms_on_page(&mut self, page: usize) -> Result<()>`	扁平化页面上的表单
`flatten_forms`	`fn flatten_forms(&mut self) -> Result<()>`	展平所有表单字段
`is_page_marked_for_form_flatten`	`fn is_page_marked_for_form_flatten(&self, page: usize) -> bool`	检查页面表单是否已标记扁平化
`will_remove_acroform`	`fn will_remove_acroform(&self) -> bool`	检查保存时是否移除 AcroForm
`export_form_data_fdf`	`fn export_form_data_fdf(&mut self, output_path: impl AsRef<Path>) -> Result<()>`	导出表单数据为 FDF
`export_form_data_xfdf`	`fn export_form_data_xfdf(&mut self, output_path: impl AsRef<Path>) -> Result<()>`	导出表单数据为 XFDF

涂黑

方法	签名	描述
`apply_page_redactions`	`fn apply_page_redactions(&mut self, page: usize) -> Result<()>`	在页面上应用涂黑
`apply_all_redactions`	`fn apply_all_redactions(&mut self) -> Result<()>`	应用所有待处理的涂黑
`is_page_marked_for_redaction`	`fn is_page_marked_for_redaction(&self, page: usize) -> bool`	检查页面是否有待处理的涂黑
`unmark_page_for_redaction`	`fn unmark_page_for_redaction(&mut self, page: usize)`	移除页面的待处理涂黑

图片

方法	签名	描述
`page_images`	`fn page_images(&mut self, page: usize) -> Result<Vec<ImageInfo>>`	列出页面上的图片
`reposition_image`	`fn reposition_image(&mut self, page: usize, image_index: usize, x: f32, y: f32) -> Result<()>`	移动图片
`resize_image`	`fn resize_image(&mut self, page: usize, image_index: usize, width: f32, height: f32) -> Result<()>`	调整图片大小
`set_image_bounds`	`fn set_image_bounds(&mut self, page: usize, image_index: usize, rect: [f32; 4]) -> Result<()>`	设置图片边界框
`clear_image_modifications`	`fn clear_image_modifications(&mut self, page: usize)`	清除待处理的图片修改
`has_image_modifications`	`fn has_image_modifications(&self, page: usize) -> bool`	检查待处理的图片修改

嵌入文件

方法	签名	描述
`embed_file`	`fn embed_file(&mut self, name: &str, data: Vec<u8>) -> Result<()>`	附加文件
`embed_file_with_options`	`fn embed_file_with_options(&mut self, file: EmbeddedFile) -> Result<()>`	使用完整配置附加
`pending_embedded_files`	`fn pending_embedded_files(&self) -> &[EmbeddedFile]`	列出待处理的文件附件
`clear_embedded_files`	`fn clear_embedded_files(&mut self)`	清除待处理的文件附件

渲染（需要 `rendering` 功能）

方法	签名	描述
`render_page`	`fn render_page(&mut self, page: usize) -> Result<RenderedImage>`	渲染为图片
`render_page_with_options`	`fn render_page_with_options(&mut self, page: usize, opts: &RenderOptions) -> Result<RenderedImage>`	使用选项渲染
`render_page_to_file`	`fn render_page_to_file(&mut self, page: usize, path: impl AsRef<Path>) -> Result<()>`	渲染并保存到文件
`render_page_to_file_with_dpi`	`fn render_page_to_file_with_dpi(&mut self, page: usize, path: impl AsRef<Path>, dpi: f32) -> Result<()>`	使用自定义 DPI 渲染

保存

方法	签名	描述
`save`	`fn save(&mut self, path: impl AsRef<Path>) -> Result<()>`	保存到文件
`save_as`	`fn save_as(&mut self, path: impl AsRef<Path>) -> Result<()>`	另存为不同文件
`save_encrypted`	`fn save_encrypted(&mut self, path: impl AsRef<Path>, user_password: &str, owner_password: &str) -> Result<()>`	使用密码保护保存
`save_with_encryption`	`fn save_with_encryption(&mut self, path: impl AsRef<Path>, config: 加密Config) -> Result<()>`	Save with full encryption config
`as_bytes`	`fn as_bytes(&self) -> &[u8]`	PDF 字节（创建模式）
`into_bytes`	`fn into_bytes(mut self) -> Vec<u8>`	消耗并返回 PDF 字节
`to_bytes`	`fn to_bytes(&mut self) -> Result<Vec<u8>>`	生成 PDF 字节
`to_markdown_file`	`fn to_markdown_file(&mut self, path: impl AsRef<Path>) -> Result<()>`	将所有页面保存为 Markdown 文件

访问器

方法	签名	描述
`source_path`	`fn source_path(&self) -> Option<&Path>`	打开文件的路径
`editor`	`fn editor(&mut self) -> Option<&mut DocumentEditor>`	访问底层编辑器
`config`	`fn config(&self) -> &PdfConfig`	当前配置
`is_modified`	`fn is_modified(&self) -> bool`	文档是否有未保存的更改

PdfBuilder

用于创建带有元数据和布局配置的 PDF 的流畅构建器。

use pdf_oxide::api::PdfBuilder;
use pdf_oxide::writer::PageSize;

方法	签名	描述
`new`	`fn new() -> Self`	创建新构建器
`title`	`fn title(self, title: impl Into<String>) -> Self`	设置标题
`author`	`fn author(self, author: impl Into<String>) -> Self`	设置作者
`subject`	`fn subject(self, subject: impl Into<String>) -> Self`	设置主题
`keywords`	`fn keywords(self, keywords: impl Into<String>) -> Self`	设置关键词
`page_size`	`fn page_size(self, size: PageSize) -> Self`	设置页面大小
`margin`	`fn margin(self, margin: f32) -> Self`	设置统一边距
`margins`	`fn margins(self, left: f32, right: f32, top: f32, bottom: f32) -> Self`	设置各边距
`font_size`	`fn font_size(self, size: f32) -> Self`	设置字体大小
`line_height`	`fn line_height(self, height: f32) -> Self`	设置行高
`from_markdown`	`fn from_markdown(self, content: &str) -> Result<Pdf>`	从 Markdown 创建
`from_html`	`fn from_html(self, content: &str) -> Result<Pdf>`	从 HTML 创建
`from_text`	`fn from_text(self, content: &str) -> Result<Pdf>`	从纯文本创建
`from_image`	`fn from_image(self, path: impl AsRef<Path>) -> Result<Pdf>`	从图像创建
`from_image_bytes`	`fn from_image_bytes(self, data: &[u8]) -> Result<Pdf>`	从图像创建 bytes
`from_images`	`fn from_images<P: AsRef<Path>>(self, paths: &[P]) -> Result<Pdf>`	从多张图片构建
`from_qrcode`	`fn from_qrcode(self, data: &str) -> Result<Pdf>`	从二维码数据构建
`from_barcode`	`fn from_barcode(self, data: &str, barcode_type: BarcodeType) -> Result<Pdf>`	从条码数据构建

DocumentBuilder

用于像素精确页面布局的底层构建器。

use pdf_oxide::writer::DocumentBuilder;

方法	签名	描述
`new`	`fn new() -> Self`	创建新构建器
`metadata`	`fn metadata(self, metadata: DocumentMetadata) -> Self`	设置文档元数据
`page`	`fn page(&mut self, size: PageSize) -> FluentPageBuilder`	添加指定大小的页面
`letter_page`	`fn letter_page(&mut self) -> FluentPageBuilder`	添加 US Letter 页面
`a4_page`	`fn a4_page(&mut self) -> FluentPageBuilder`	添加 A4 页面
`build`	`fn build(self) -> Result<Vec<u8>>`	生成 PDF 字节
`save`	`fn save(self, path: impl AsRef<Path>) -> Result<()>`	保存到文件

FluentPageBuilder

由 DocumentBuilder::page(). 链式调用以向页面添加内容：

方法	描述
`text(text, x, y, size)`	在精确坐标放置文本
`heading(level, text)`	添加标题（H1-H6）
`paragraph(text)`	添加自动换行的段落
`space(points)`	添加垂直间距
`horizontal_rule()`	绘制水平线
`link_url(url)`	Add a URL link annotation
`link_page(page)`	Add internal page link
`highlight(color)`	添加高亮注释
`underline(color)`	添加下划线注释
`strikeout(color)`	添加删除线注释
`sticky_note(text)`	添加便签
`stamp(stamp_type)`	添加图章注释
`freetext(rect, text)`	添加自由文本注释
`watermark(text)`	添加水印覆盖
`add_annotation(annotation)`	添加任何注释类型
`done()`	完成页面，返回构建器

DocumentEditor

打开现有 PDF 进行结构修改。

use pdf_oxide::editor::DocumentEditor;

核心

方法	签名	描述
`open`	`fn open(path: impl AsRef<Path>) -> Result<Self>`	打开文件进行编辑
`is_modified`	`fn is_modified(&self) -> bool`	检查未保存的更改
`source_path`	`fn source_path(&self) -> &str`	原始文件路径
`source`	`fn source(&self) -> &PdfDocument`	底层文档（只读）
`source_mut`	`fn source_mut(&mut self) -> &mut PdfDocument`	底层文档（可写）
`version`	`fn version(&self) -> (u8, u8)`	PDF 版本
`current_page_count`	`fn current_page_count(&self) -> usize`	页数

元数据

方法	描述
`title()` / `set_title()`	获取/设置文档标题
`author()` / `set_author()`	获取/设置作者
`subject()` / `set_subject()`	获取/设置主题
`keywords()` / `set_keywords()`	获取/设置关键词

页面操作

方法	描述
`get_page_rotation()` / `set_page_rotation()`	获取/设置旋转
`rotate_page_by()`	添加相对旋转
`rotate_all_pages()`	旋转所有页面
`get_page_media_box()` / `set_page_media_box()`	获取/设置 MediaBox
`get_page_crop_box()` / `set_page_crop_box()`	获取/设置 CropBox
`crop_margins()`	按边距裁剪所有页面
`erase_region()` / `erase_regions()`	涂白内容
`extract_pages()`	提取页面到单独文件
`merge_from()` / `merge_pages_from()`	从另一个 PDF 合并页面

类 DOM 编辑

方法	签名	描述
`get_page`	`fn get_page(&mut self, page_index: usize) -> Result<PdfPage>`	获取 DOM 页面句柄
`save_page`	`fn save_page(&mut self, page: PdfPage) -> Result<()>`	保存修改后的页面
`edit_page`	`fn edit_page<F>(&mut self, page_index: usize, f: F) -> Result<()>`	使用闭包编辑
`page_editor`	`fn page_editor(&mut self, page_index: usize) -> Result<PageEditor>`	获取页面编辑器
`get_page_content`	`fn get_page_content(&mut self, page_index: usize) -> Result<Option<结构Element>>`	获取页面结构
`set_page_content`	`fn set_page_content(&mut self, page_index: usize, content: 结构Element) -> Result<()>`	设置页面结构
`modify_structure`	`fn modify_structure<F>(&mut self, page_index: usize, f: F) -> Result<()>`	使用闭包修改结构

表单字段

方法	描述
`get_form_fields()`	列出所有表单字段
`get_form_field_value(name)`	按名称获取字段值
`set_form_field_value(name, value: FormFieldValue)`	按名称设置字段值
`has_form_field(name)`	检查字段是否存在
`add_form_field(widget)`	添加新表单字段
`flatten_forms_on_page(page)`	扁平化页面上的表单
`flatten_forms()`	展平所有表单字段
`export_form_data_fdf(path)`	导出为 FDF
`export_form_data_xfdf(path)`	导出为 XFDF
`has_xfa()`	检查是否有 XFA 表单
`analyze_xfa()`	分析 XFA 表单数据
`convert_xfa_to_acroform()`	将 XFA 转换为 AcroForm

注释与扁平化

方法	描述
`flatten_page_annotations(page)`	扁平化页面上的注释
`flatten_all_annotations()`	展平所有注释
`get_page_annotations(page)`	列出页面上的注释

嵌入文件

方法	描述
`embed_file(name, data)`	附加文件
`embed_file_with_options(file)`	使用完整配置附加
`pending_embedded_files()`	列出待处理的附件
`clear_embedded_files()`	清除待处理的附件

DOM 类型

PdfPage

表示具有可查询和可编辑元素的单个页面。

方法	描述
`elements()`	页面上的所有元素
`text_elements()`	仅文本元素
`image_elements()`	仅图片元素
`path_elements()`	仅路径/图形元素
`table_elements()`	仅表格元素
`find_text_containing(needle)`	查找匹配子字符串的文本
`set_text(id, new_text)`	替换文本 by element ID

PdfText

方法	返回值	描述
`id()`	`ElementId`	唯一元素标识符
`text()`	`&str`	文本内容
`bbox()`	`Rect`	边界矩形
`font_name()`	`&str`	字体名称
`font_size()`	`f32`	字体大小（磅）
`is_bold()`	`bool`	粗体
`is_italic()`	`bool`	斜体
`color()`	`Color`	文本颜色
`set_text(new)`		替换文本
`append(text)`		Append text
`replace(old, new)`	`usize`	Replace occurrences
`clear()`		Clear text

PdfImage

方法	返回值	描述
`id()`	`ElementId`	唯一标识符
`bbox()`	`Rect`	边界矩形
`format()`	`ImageFormat`	图像格式
`dimensions()`	`(u32, u32)`	宽度和高度（像素）
`aspect_ratio()`	`f32`	宽高比
`is_grayscale()`	`bool`	Grayscale check
`alt_text()`	`Option<&str>`	Alt text for accessibility
`resolution()`	`Option<(f32, f32)>`	DPI，格式为（水平，垂直）

PdfPath

方法	返回值	描述
`id()`	`ElementId`	唯一标识符
`bbox()`	`Rect`	边界矩形
`operations()`	`&[PathOperation]`	Path drawing operations
`stroke_color()`	`Option<Color>`	描边颜色
`fill_color()`	`Option<Color>`	填充颜色
`stroke_width()`	`f32`	线宽
`line_cap()`	`LineCap`	Line cap style
`line_join()`	`LineJoin`	Line join style
`is_closed()`	`bool`	路径是否闭合
`to_svg()`	`String`	转换为 SVG 路径数据
`to_svg_document()`	`String`	转换为独立 SVG

PdfTable

方法	返回值	描述
`id()`	`ElementId`	唯一标识符
`bbox()`	`Rect`	边界矩形
`row_count()`	`usize`	行数
`column_count()`	`usize`	列数
`has_header()`	`bool`	第一行是否为表头
`get_cell(row, col)`	`Option<&TableCellContent>`	单元格内容
`caption()`	`Option<&str>`	表格标题

TextSearcher

use pdf_oxide::search::{TextSearcher, SearchOptions, SearchResult};

SearchOptions

字段	类型	默认值	描述
`case_sensitive`	`bool`	`true`	大小写敏感匹配
`literal`	`bool`	`false`	将模式视为字面量（非正则）
`whole_word`	`bool`	`false`	仅匹配完整单词
`max_results`	`Option<usize>`	`None`	限制结果数量
`page_range`	`Option<(usize, usize)>`	`None`	限制页面范围

SearchResult

字段	类型	描述
`page`	`usize`	页面索引
`text`	`String`	匹配的文本
`x`	`f64`	X position in points
`y`	`f64`	Y position in points

FormField 与 XmpExtractor

FormField（读取）

字段	类型	描述
`name`	`String`	完全限定字段名称
`field_type`	`FieldType`	Text, Button, Choice, Signature
`value`	`Option<String>`	当前值
`rect`	`Option<Rect>`	控件边界
`flags`	`u32`	字段标志

XmpExtractor

use pdf_oxide::extractors::xmp::XmpExtractor;

操作 PdfDocument:

方法	返回值	描述
`extract(doc)`	`Result<Option<XmpMetadata>>`	提取 XMP 元数据

XmpMetadata

字段	类型	描述
`title`	`Option<String>`	dc:title
`creators`	`Vec<String>`	dc:creator
`description`	`Option<String>`	dc:description
`creator_tool`	`Option<String>`	xmp:CreatorTool
`create_date`	`Option<String>`	xmp:CreateDate
`modify_date`	`Option<String>`	xmp:ModifyDate
`producer`	`Option<String>`	pdf:Producer

PageLabelExtractor

use pdf_oxide::extractors::page_labels::PageLabelExtractor;

方法	返回值	描述
`extract(doc)`	`Result<Vec<PageLabelRange>>`	提取页面标签定义
`label_for_page(doc, page)`	`Result<String>`	计算特定页面的标签
`all_labels(doc)`	`Result<Vec<String>>`	计算所有页面的标签

独立函数

use pdf_oxide::document::{parse_header, parse_trailer};

函数	签名	描述
`parse_header`	`fn parse_header<R: Read + Seek>(reader: &mut R, lenient: bool) -> Result<(u8, u8, u64)>`	解析 PDF 头部，返回 (major, minor, byte_offset)
`parse_trailer`	`fn parse_trailer<R: Read>(reader: &mut R) -> Result<Object>`	解析尾部字典

下一步

Python API 参考 – Python 绑定参考
类型与枚举 – 所有类型、枚举和配置结构体
Rust 入门 – 教程 with examples

Rust API 参考

PdfDocument

打开与认证

元数据

文本提取

格式转换

图片提取

路径与图形提取

页面信息

结构与无障碍

底层

Pdf

构造函数

提取

搜索

元数据

XMP 元数据

页面标签

页面操作

内容编辑

注释

表单

涂黑

图片

嵌入文件

渲染（需要 rendering 功能）

保存

访问器

PdfBuilder

DocumentBuilder

FluentPageBuilder

DocumentEditor

核心

元数据

页面操作

类 DOM 编辑

表单字段

注释与扁平化

嵌入文件

DOM 类型

PdfPage

PdfText

PdfImage

PdfPath

PdfTable

TextSearcher

SearchOptions

SearchResult

FormField 与 XmpExtractor

FormField（读取）

XmpExtractor

XmpMetadata

PageLabelExtractor

独立函数

下一步

渲染（需要 `rendering` 功能）