What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

在Python、Rust、Go中合并与拆分PDF

绑定支持情况。 PDF合并可在Python、Rust和Go中使用。通过extract_pages将内容拆分到文件的功能支持Python和Rust；内存版本extract_pages_to_bytes还额外支持Swift和C ABI（不含WASM）。基于书签的分割规划（plan_split_by_bookmarks）可在Python、Rust、Swift、WASM和C ABI中使用。对于文件拆分绑定，C#绑定尚未公开这些编辑操作——可使用Rust CLI（pdf-oxide merge、pdf-oxide split）作为替代方案，或通过某个已支持的绑定调用。

将两个PDF合并为一个：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("main.pdf")
doc.merge_from("appendix.pdf")
doc.save("combined.pdf")

WASM

import { WasmPdfDocument } from "pdf-oxide-wasm";

// Load both PDFs as Uint8Array
const mainDoc = new WasmPdfDocument(mainBytes);
const appendixDoc = new WasmPdfDocument(appendixBytes);
// Extract text from both and process as needed
const allText = mainDoc.extractAllText() + "\n" + appendixDoc.extractAllText();
mainDoc.free();
appendixDoc.free();

Rust

use pdf_oxide::editor::DocumentEditor;

let mut editor = DocumentEditor::open("main.pdf")?;
editor.merge_from("appendix.pdf")?;
editor.save("combined.pdf")?;

package main

import (
    "log"
    pdfoxide "github.com/yfedoseev/pdf_oxide/go"
)

func main() {
    editor, err := pdfoxide.OpenEditor("main.pdf")
    if err != nil { log.Fatal(err) }
    defer editor.Close()

    if _, err := editor.MergeFrom("appendix.pdf"); err != nil { log.Fatal(err) }
    if err := editor.Save("combined.pdf"); err != nil { log.Fatal(err) }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto editor = pdf_oxide::DocumentEditor::open("main.pdf");
editor.merge_from("appendix.pdf");
editor.save("combined.pdf");

Swift

import PdfOxide

let editor = try DocumentEditor.open("main.pdf")
try editor.mergeFrom("appendix.pdf")
try editor.save("combined.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final editor = DocumentEditor.open('main.pdf');
editor.mergeFrom('appendix.pdf');
editor.save('combined.pdf');

library(pdfoxide)

editor <- pdf_editor_open("main.pdf")
pdf_editor_merge_from(editor, "appendix.pdf")
pdf_editor_save(editor, "combined.pdf")

Julia

using PdfOxide

editor = open_editor("main.pdf")
merge_from(editor, "appendix.pdf")
save(editor, "combined.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");

var editor = try pdf_oxide.DocumentEditor.openEditor("main.pdf");
defer editor.deinit();
try editor.mergeFrom("appendix.pdf");
try editor.save("combined.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"main.pdf" error:&err];
[editor mergeFrom:@"appendix.pdf" error:&err];
[editor saveToPath:@"combined.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("main.pdf")
:ok = PdfOxide.merge_from(editor, "appendix.pdf")
:ok = PdfOxide.editor_save(editor, "combined.pdf")

PDF Oxide在PDF对象层面处理页面合并——字体、图像和注释在文档间得到完整保留。

安装

pip install pdf_oxide

合并PDF

合并所有页面

将第二个PDF的所有页面追加到第一个PDF：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
doc.merge_from("charts.pdf")
doc.save("full-report.pdf")

WASM

// WASM API: load and process multiple documents
const report = new WasmPdfDocument(reportBytes);
const charts = new WasmPdfDocument(chartsBytes);
// Process both documents together
const fullText = report.extractAllText() + "\n" + charts.extractAllText();
report.free();
charts.free();

Rust

let mut editor = DocumentEditor::open("report.pdf")?;
let pages_added = editor.merge_from("charts.pdf")?;
println!("Added {} pages", pages_added);
editor.save("full-report.pdf")?;

editor, _ := pdfoxide.OpenEditor("report.pdf")
defer editor.Close()

added, _ := editor.MergeFrom("charts.pdf")
fmt.Printf("Added %d pages\n", added)
_ = editor.Save("full-report.pdf")

C++

auto editor = pdf_oxide::DocumentEditor::open("report.pdf");
editor.merge_from("charts.pdf");
editor.save("full-report.pdf");

Swift

let editor = try DocumentEditor.open("report.pdf")
try editor.mergeFrom("charts.pdf")
try editor.save("full-report.pdf")

Dart

final editor = DocumentEditor.open('report.pdf');
editor.mergeFrom('charts.pdf');
editor.save('full-report.pdf');

editor <- pdf_editor_open("report.pdf")
pdf_editor_merge_from(editor, "charts.pdf")
pdf_editor_save(editor, "full-report.pdf")

Julia

editor = open_editor("report.pdf")
merge_from(editor, "charts.pdf")
save(editor, "full-report.pdf")

Zig

var editor = try pdf_oxide.DocumentEditor.openEditor("report.pdf");
defer editor.deinit();
try editor.mergeFrom("charts.pdf");
try editor.save("full-report.pdf");

Objective-C

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"report.pdf" error:&err];
[editor mergeFrom:@"charts.pdf" error:&err];
[editor saveToPath:@"full-report.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("report.pdf")
:ok = PdfOxide.merge_from(editor, "charts.pdf")
:ok = PdfOxide.editor_save(editor, "full-report.pdf")

合并多个文件

使用静态方法Pdf.merge()一次性合并多个PDF：

Python

from pdf_oxide import Pdf

pdf = Pdf.merge(["intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"])
pdf.save("book.pdf")

也可以在现有文档上链式调用merge_from()：

from pdf_oxide import PdfDocument

doc = PdfDocument("intro.pdf")
for f in ["chapter1.pdf", "chapter2.pdf", "appendix.pdf"]:
    doc.merge_from(f)
doc.save("book.pdf")

WASM

// Load and process multiple PDFs sequentially
const files = [introBytes, ch1Bytes, ch2Bytes, appendixBytes];
const allText = [];
for (const bytes of files) {
    const doc = new WasmPdfDocument(bytes);
    allText.push(doc.extractAllText());
    doc.free();
}
console.log(allText.join("\n"));

Rust

let files = ["intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"];
let mut editor = DocumentEditor::open(files[0])?;
for f in &files[1..] {
    editor.merge_from(f)?;
}
editor.save("book.pdf")?;

// Top-level Merge returns the combined PDF bytes in one call
bytes, err := pdfoxide.Merge([]string{
    "intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf",
})
if err != nil { log.Fatal(err) }
_ = os.WriteFile("book.pdf", bytes, 0644)

C++

// Top-level merge returns the combined PDF bytes in one call
auto bytes = pdf_oxide::merge({"intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"});
std::ofstream("book.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(bytes.data()), bytes.size());

Swift

// Top-level merge returns the combined PDF bytes in one call
let bytes = try merge(["intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"])
try Data(bytes).write(to: URL(fileURLWithPath: "book.pdf"))

Dart

// Top-level pdfMerge returns the combined PDF bytes in one call
final bytes = pdfMerge(['intro.pdf', 'chapter1.pdf', 'chapter2.pdf', 'appendix.pdf']);
File('book.pdf').writeAsBytesSync(bytes);

# Top-level pdf_merge returns the combined PDF bytes in one call
bytes <- pdf_merge(c("intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"))
writeBin(bytes, "book.pdf")

Julia

# Top-level merge_pdfs returns the combined PDF bytes in one call
bytes = merge_pdfs(["intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"])
write("book.pdf", bytes)

Zig

const a = std.heap.page_allocator;
const paths = [_][*:0]const u8{ "intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf" };
const bytes = try pdf_oxide.merge(a, &paths); // combined PDF bytes
defer a.free(bytes);
const out = try std.fs.cwd().createFile("book.pdf", .{});
defer out.close();
try out.writeAll(bytes);

Objective-C

// Top-level merge returns the combined PDF bytes in one call
NSData *bytes = [POXTools merge:@[@"intro.pdf", @"chapter1.pdf", @"chapter2.pdf", @"appendix.pdf"]
                          error:&err];
[bytes writeToFile:@"book.pdf" atomically:YES];

Elixir

# Top-level merge returns the combined PDF bytes in one call
{:ok, bytes} = PdfOxide.merge(["intro.pdf", "chapter1.pdf", "chapter2.pdf", "appendix.pdf"])
File.write!("book.pdf", bytes)

合并指定页面

从源文档中选择要合并的页面：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("main.pdf")
# Merge only pages 0, 2, and 4 from source
doc.merge_pages_from("source.pdf", [0, 2, 4])
doc.save("selected.pdf")

Rust

let mut editor = DocumentEditor::open("main.pdf")?;
editor.merge_pages_from("source.pdf", &[0, 2, 4])?;
editor.save("selected.pdf")?;

拆分PDF

将页面提取到新文件

从大型文档中提取指定页面：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("book.pdf")
doc.extract_pages([0, 1, 2, 3, 4], "chapter1.pdf")

WASM

// Extract text from specific pages
const doc = new WasmPdfDocument(bytes);
const pages = [0, 1, 2, 3, 4];
for (const i of pages) {
    const text = doc.extractText(i);
    console.log(`Page ${i + 1}: ${text.slice(0, 80)}...`);
}
doc.free();

Rust

let mut editor = DocumentEditor::open("book.pdf")?;
editor.extract_pages(&[0, 1, 2, 3, 4], "chapter1.pdf")?;

拆分为单独页面

将每一页保存为独立文件：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("document.pdf")
for i in range(doc.page_count()):
    doc.extract_pages([i], f"page_{i + 1}.pdf")

Rust

let mut editor = DocumentEditor::open("document.pdf")?;
let page_count = editor.page_count()?;
for i in 0..page_count {
    editor.extract_pages(&[i], &format!("page_{}.pdf", i + 1))?;
}

按块拆分

将大型PDF拆分为每N页一个的小文件：

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("large.pdf")
chunk_size = 10

for start in range(0, doc.page_count(), chunk_size):
    end = min(start + chunk_size, doc.page_count())
    pages = list(range(start, end))
    doc.extract_pages(pages, f"chunk_{start // chunk_size + 1}.pdf")

Rust

let mut editor = DocumentEditor::open("large.pdf")?;
let page_count = editor.page_count()?;
let chunk_size = 10;

for start in (0..page_count).step_by(chunk_size) {
    let end = (start + chunk_size).min(page_count);
    let pages: Vec<usize> = (start..end).collect();
    editor.extract_pages(&pages, &format!("chunk_{}.pdf", start / chunk_size + 1))?;
}

拆分为内存字节（无需临时文件）

当拆分出的块要直接发送到S3、HTTP响应或同进程的下一步处理时，使用extract_pages_to_bytes可完全跳过磁盘写入。它以字节形式返回新PDF，源文档保持不变。

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("large.pdf")
chunk_size = 10
chunks = []

for start in range(0, doc.page_count(), chunk_size):
    end = min(start + chunk_size, doc.page_count())
    pages = list(range(start, end))
    chunk_bytes = doc.extract_pages_to_bytes(pages)  # bytes, not a file
    chunks.append(chunk_bytes)

print(f"Produced {len(chunks)} in-memory chunks")

Rust

let mut editor = DocumentEditor::open("large.pdf")?;
let page_count = editor.page_count()?;
let chunk_size = 10;
let mut chunks: Vec<Vec<u8>> = Vec::new();

for start in (0..page_count).step_by(chunk_size) {
    let end = (start + chunk_size).min(page_count);
    let pages: Vec<usize> = (start..end).collect();
    chunks.push(editor.extract_pages_to_bytes(&pages)?); // Vec<u8>, no file written
}

Swift

let editor = try DocumentEditor(path: "large.pdf")
let pageCount = try editor.pageCount()
let chunkSize = 10
var chunks: [[UInt8]] = []

for start in stride(from: 0, to: pageCount, by: chunkSize) {
    let end = min(start + chunkSize, pageCount)
    chunks.append(try editor.extractPagesToBytes(Array(start..<end)))
}

C++

auto editor = pdf_oxide::DocumentEditor::open("large.pdf");
int page_count = editor.page_count();
const int chunk_size = 10;
std::vector<std::vector<std::uint8_t>> chunks;

for (int start = 0; start < page_count; start += chunk_size) {
    int end = std::min(start + chunk_size, page_count);
    std::vector<int32_t> pages;
    for (int i = start; i < end; ++i) pages.push_back(i);
    chunks.push_back(editor.extract_pages_to_bytes(pages)); // bytes, no file written
}

Dart

final editor = DocumentEditor.open('large.pdf');
final pageCount = editor.pageCount;
const chunkSize = 10;
final chunks = <Uint8List>[];

for (var start = 0; start < pageCount; start += chunkSize) {
  final end = (start + chunkSize).clamp(0, pageCount);
  final pages = [for (var i = start; i < end; i++) i];
  chunks.add(editor.extractPagesToBytes(pages)); // bytes, no file written
}

editor <- pdf_editor_open("large.pdf")
page_count <- pdf_editor_page_count(editor)
chunk_size <- 10
chunks <- list()

for (start in seq(0, page_count - 1, by = chunk_size)) {
  end <- min(start + chunk_size, page_count)
  pages <- seq(start, end - 1)
  chunks[[length(chunks) + 1]] <- pdf_editor_extract_pages_to_bytes(editor, pages)
}

Julia

editor = open_editor("large.pdf")
n = page_count(editor)
chunk_size = 10
chunks = Vector{Vector{UInt8}}()

for start in 0:chunk_size:(n - 1)
    stop = min(start + chunk_size, n)
    pages = collect(start:(stop - 1))
    push!(chunks, extract_pages_to_bytes(editor, pages))  # bytes, no file written
end

Zig

const a = std.heap.page_allocator;
var editor = try pdf_oxide.DocumentEditor.openEditor("large.pdf");
defer editor.deinit();
const page_count = try editor.pageCount();
const chunk_size: i32 = 10;

var start: i32 = 0;
while (start < page_count) : (start += chunk_size) {
    const end = @min(start + chunk_size, page_count);
    var pages = std.ArrayList(i32).init(a);
    defer pages.deinit();
    var i = start;
    while (i < end) : (i += 1) try pages.append(i);
    const chunk = try editor.extractPagesToBytes(a, pages.items); // bytes, no file written
    a.free(chunk);
}

Objective-C

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"large.pdf" error:&err];
NSInteger pageCount = [editor pageCountError:&err];
NSInteger chunkSize = 10;
NSMutableArray<NSData*> *chunks = [NSMutableArray array];

for (NSInteger start = 0; start < pageCount; start += chunkSize) {
    NSInteger end = MIN(start + chunkSize, pageCount);
    NSMutableArray<NSNumber*> *pages = [NSMutableArray array];
    for (NSInteger i = start; i < end; i++) [pages addObject:@(i)];
    [chunks addObject:[editor extractPagesToBytes:pages error:&err]]; // bytes, no file written
}

Elixir

{:ok, editor} = PdfOxide.open_editor("large.pdf")
{:ok, n} = PdfOxide.editor_page_count(editor)
chunk_size = 10

chunks =
  0..(n - 1)
  |> Enum.take_every(chunk_size)
  |> Enum.map(fn start ->
    stop = min(start + chunk_size, n)
    {:ok, bytes} = PdfOxide.extract_pages_to_bytes(editor, Enum.to_list(start..(stop - 1)))
    bytes
  end)

extract_pages_to_bytes可在Python、Rust、Swift、C++、Dart、R、Julia、Zig、Objective-C、Elixir和C ABI中使用。WASM构建中未公开此接口。

按书签拆分

对于带有大纲（目录）的文档，你可以在书签边界处规划拆分——例如每个顶级章节生成一个PDF——无需手动计算页面范围。plan_split_by_bookmarks是预演规划器：它返回分段计划（页面范围、标题、文件系统安全的文件名词干），不生成任何PDF字节，让你在实际写入之前可以预览、筛选或调整输出。

规划拆分（预演）

Python

import pdf_oxide

with open("manual.pdf", "rb") as f:
    src = f.read()

# level=1 -> split at top-level bookmarks only (0 = every depth, n = up to depth n)
segments = pdf_oxide.plan_split_by_bookmarks(src, level=1)

for seg in segments:
    # keys: index, start_page, end_page, title, file_stem, page_label
    print(f"#{seg['index']}: pages {seg['start_page']}-{seg['end_page'] - 1} "
          f"=> {seg['file_stem']}.pdf  ({seg['title']})")

Rust

use pdf_oxide::PdfDocument;
use pdf_oxide::split_bookmarks::{plan_split_by_bookmarks, SplitByBookmarksOptions, BookmarkLevel};

let doc = PdfDocument::open("manual.pdf")?;

let opts = SplitByBookmarksOptions {
    level: BookmarkLevel::TopLevel,  // top-level bookmarks only
    ..Default::default()
};

// Cheap: returns Vec<BookmarkSegment>, no PDF bytes produced
let segments = plan_split_by_bookmarks(&doc, &opts)?;
for seg in &segments {
    println!("#{}: pages {}..{} => {}.pdf ({:?})",
        seg.index, seg.start_page, seg.end_page, seg.file_stem, seg.title);
}

Swift

import PdfOxide

let doc = try PdfDocument(path: "manual.pdf")

// Returns a JSON array of segment objects (index, startPage, endPage, ...)
let planJson = try doc.planSplitByBookmarks(optionsJson: #"{"level": 1}"#)
print(planJson)

WASM

import { planSplitByBookmarks } from "pdf-oxide-wasm";

// level 1 = top-level bookmarks; returns an array of segment objects
const segments = planSplitByBookmarks(bytes, null, false, 1, true);
for (const seg of segments) {
  console.log(`#${seg.index}: ${seg.startPage}-${seg.endPage} => ${seg.fileStem}.pdf`);
}

Java

import fyi.oxide.pdf.Pdf;
import java.nio.file.*;

byte[] src = Files.readAllBytes(Path.of("manual.pdf"));

// level 1 = top-level bookmarks; returns the number of segments the split would produce
int segmentCount = Pdf.planSplitByBookmarksCount(src, 1);
System.out.println(segmentCount + " segments");

Ruby

require 'pdf_oxide'

src = File.binread('manual.pdf')

# level 1 = top-level bookmarks; returns the number of segments the split would produce
segment_count = PdfOxide::Pdf.plan_split_by_bookmarks_count(src, 1)
puts "#{segment_count} segments"

C++

auto doc = pdf_oxide::Document::open("manual.pdf");

// Returns a JSON array of segment objects (index, start_page, end_page, ...)
std::string planJson = doc.plan_split_by_bookmarks(R"({"level": 1})");
std::cout << planJson << "\n";

Dart

final doc = PdfDocument.open('manual.pdf');

// Returns a JSON array of segment objects (index, startPage, endPage, ...)
final planJson = doc.planSplitByBookmarks('{"level": 1}');
print(planJson);

doc <- pdf_open("manual.pdf")

# Returns a JSON array of segment objects (index, start_page, end_page, ...)
plan_json <- pdf_plan_split_by_bookmarks(doc, '{"level": 1}')
cat(plan_json, "\n")

Julia

doc = open_document("manual.pdf")

# Returns a JSON array of segment objects (index, start_page, end_page, ...)
plan_json = plan_split_by_bookmarks(doc, """{"level": 1}""")
println(plan_json)

Zig

const a = std.heap.page_allocator;
var doc = try pdf_oxide.Document.open("manual.pdf");
defer doc.deinit();

// Returns a JSON array of segment objects (index, start_page, end_page, ...)
const plan_json = try doc.planSplitByBookmarks(a, "{\"level\": 1}");
defer a.free(plan_json);
std.debug.print("{s}\n", .{plan_json});

Objective-C

POXDocument *doc = [POXDocument openPath:@"manual.pdf" error:&err];

// Returns a JSON array of segment objects (index, start_page, end_page, ...)
NSString *planJson = [doc planSplitByBookmarks:@"{\"level\": 1}" error:&err];
NSLog(@"%@", planJson);

Elixir

{:ok, doc} = PdfOxide.open("manual.pdf")

# Returns a JSON array of segment objects (index, start_page, end_page, ...)
{:ok, plan_json} = PdfOxide.plan_split_by_bookmarks(doc, ~s({"level": 1}))
IO.puts(plan_json)

每个分段包含：index（从1开始的序号）、start_page（包含，从0开始）、end_page（不包含，从0开始——范围为start_page..end_page）、title（源书签标题，前置扉页分段为null）、file_stem（去重后的文件系统安全词干，不含扩展名）和page_label。

拆分选项

选项	默认值	说明
`title_prefix`	无	仅在标题以此前缀开头的书签处拆分
`ignore_case`	`false`	前缀匹配时忽略大小写
`level`	`1`（顶级）	`0` = 所有深度，`1` = 仅顶级，`n` = 至多深度n
`include_front_matter`	`true`	将第一个拆分点之前的页面作为前置分段输出

先规划，再提取

由于规划结果只是页面范围，可以直接将每个分段送入内存提取器，无需重新计算边界。

import pdf_oxide
from pdf_oxide import PdfDocument

with open("manual.pdf", "rb") as f:
    src = f.read()

doc = PdfDocument.from_bytes(src)
for seg in pdf_oxide.plan_split_by_bookmarks(src, level=1):
    pages = list(range(seg["start_page"], seg["end_page"]))
    chunk = doc.extract_pages_to_bytes(pages)
    with open(f"{seg['file_stem']}.pdf", "wb") as out:
        out.write(chunk)

绑定支持情况。 plan_split_by_bookmarks已在Python（模块级函数pdf_oxide.plan_split_by_bookmarks）、Rust（pdf_oxide::split_bookmarks::plan_split_by_bookmarks）、Swift（planSplitByBookmarks）、WASM（planSplitByBookmarks）和C ABI（pdf_document_plan_split_by_bookmarks）中公开。当文档没有大纲时会抛出错误（Python中为RuntimeError）——此时请改用逐页或按块拆分。

常见问题

extract_pages和extract_pages_to_bytes有什么区别？ extract_pages(pages, output)将结果写入文件路径；extract_pages_to_bytes(pages)将新PDF以内存字节形式返回。两者都使用从0开始的页面索引，且不修改源文档——当输出需要流式传输或存储而无需写入磁盘时，请选择内存版本。

plan_split_by_bookmarks会创建PDF文件吗？ 不会。它是一个纯粹的轻量级规划器，仅返回分段元数据（页面范围、标题、文件词干）。要实际生成块，需配合extract_pages_to_bytes使用，或使用一次性辅助函数split_by_bookmarks（Python/Rust/WASM），后者直接返回分段与字节的配对结果。

如何按章节将PDF拆分为单独文件？ 如果PDF有大纲，调用plan_split_by_bookmarks(src, level=1)为每个顶级书签获取一个分段，再用extract_pages_to_bytes提取每个分段的start_page..end_page范围。设置level=0可在所有大纲深度处拆分。

为什么拆分速度这么快？ 拆分在PDF Oxide纯Rust内核的PDF对象层面运行——与基准测试中平均提取耗时0.8ms、通过率100%的引擎完全一致。规划拆分只需访问大纲和页数，即使面对大型文档也几乎是即时完成的。

在Python、Rust、Go中合并与拆分PDF

安装

合并PDF

合并所有页面

合并多个文件

合并指定页面

拆分PDF

将页面提取到新文件

拆分为单独页面

按块拆分

拆分为内存字节（无需临时文件）

按书签拆分

规划拆分（预演）

拆分选项

先规划，再提取

常见问题

相关页面