What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

涂黑与净化

真正的涂黑是破坏性的：底层字形、图像和路径必须从内容流中被物理删除，而不仅仅是用黑色矩形覆盖。PDF Oxide v0.3.69 实现了 ISO 32000-1:2008 §12.5.6.23 涂黑规范——redaction_apply 返回物理删除的字形数量，当页面使用无法安全重写的复合/Type0 字体时，它会以失败关闭（拒绝操作而非悄悄地漏掉部分涂黑）。

本页介绍规范的破坏性涂黑系列（redaction_add / redaction_apply / redaction_count / redaction_scrub_metadata）以及页眉/页脚/工件删除系列（remove_headers / remove_footers / remove_artifacts，以及按页面的 erase_header / erase_footer / erase_artifacts）。

两种涂黑接口。 v0.3.69 仍保留了旧版注释扁平化路径（apply_page_redactions / apply_all_redactions），该路径将 /Redact 注释烧录为视觉叠加层。当您需要内容被彻底删除时，请优先使用这里介绍的破坏性系列。破坏性的 redaction_apply 同样会处理源文档中已有的 /Redact 注释，因此两种接口可以组合使用。

绑定覆盖范围。 破坏性涂黑系列在 Rust、Python、Go、C# 和 WASM/JavaScript 构建中公开。页眉/页脚/工件的 remove_* 系列在 Rust、Python、Go 和 WASM/JavaScript 中公开；按页面的 erase_* 系列在 Rust、Python 和 WASM/JavaScript 中公开。C# 目前尚未公开 remove_* / erase_* 页面装饰系列。

如何从 PDF 中涂黑文本？

破坏性工作流分三个步骤：

使用 redaction_add 排队涂黑矩形（页面用户空间坐标；可选的 DeviceRGB 叠加填充色）。源文档中已有的 /Redact 注释会被自动纳入。
使用 redaction_apply 应用——物理删除覆盖区域的字形/图像/路径，绘制不透明叠加层，可选地清除文档元数据，并返回删除的字形数量。
保存重写后的 PDF。原始 /Contents 对象被硬删除（G6），因此秘密内容不会作为被 GC 遗漏的孤立对象残留。

排队涂黑矩形

redaction_add(page, rect, fill) 排队一个破坏性矩形。坐标为页面用户空间 (x0, y0, x1, y1)；fill 是可选的 DeviceRGB [r, g, b] 叠加颜色（默认为黑色）。

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("confidential.pdf")?;

// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;

// Apply destructively, then save. Returns a RedactionReport.
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!("glyphs removed: {}", report.glyphs_removed);

editor.save("redacted.pdf")?;

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("confidential.pdf")

# Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))

# Apply destructively (returns a report dict), then save.
report = doc.apply_redactions_destructive()
print("glyphs removed:", report["glyphs_removed"])

doc.save("redacted.pdf")

editor, _ := pdfoxide.OpenEditor("confidential.pdf")
defer editor.Close()

// Queue a redaction box on page 0; pass nil fill for the default black.
_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)

// Apply destructively (scrubMetadata = true). Returns glyphs removed.
glyphs, _ := editor.ApplyRedactions(true)
fmt.Printf("glyphs removed: %d\n", glyphs)

_ = editor.Save("redacted.pdf")

using PdfOxide;

using var editor = DocumentEditor.Open("confidential.pdf");

// Queue a redaction box on page 0 (x1, y1, x2, y2; r, g, b default to black).
editor.AddRedaction(0, 100, 700, 300, 714);

// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
int glyphs = editor.ApplyRedactions(scrubMetadata: true);
Console.WriteLine($"glyphs removed: {glyphs}");

editor.Save("redacted.pdf");

JavaScript (WASM)

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);

// Queue a redaction box on page 0; pass an [r, g, b] array for a custom fill.
doc.addRedaction(0, 100, 700, 300, 714);

// Apply destructively. Returns a RedactionReport object.
const report = doc.applyRedactionsDestructive(true);
console.log("glyphs removed:", report.glyphs_removed);

const output = doc.save();
doc.free();

Java

import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.geometry.BBox;

try (DocumentEditor editor = DocumentEditor.open("confidential.pdf")) {
    // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
    editor.addRedaction(0, new BBox(100.0, 700.0, 300.0, 714.0));

    // Apply destructively (scrubs metadata), then save.
    editor.applyRedactionsDestructive();
    editor.saveTo(java.nio.file.Path.of("redacted.pdf"));
}

Kotlin

import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox

DocumentEditor.open("confidential.pdf").use { editor ->
    // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
    editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))

    // Apply destructively (scrubs metadata), then save.
    editor.applyRedactionsDestructive()
    editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}

Scala

import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox
import scala.util.Using

Using.resource(DocumentEditor.open("confidential.pdf")) { editor =>
  // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
  editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))

  // Apply destructively (scrubs metadata), then save.
  editor.applyRedactionsDestructive()
  editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import '[fyi.oxide.pdf.geometry BBox])

(with-open [ed (pdf/editor "confidential.pdf")]
  ;; Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
  (pdf/add-redaction ed 0 (BBox. 100.0 700.0 300.0 714.0))
  ;; Apply destructively (scrubs metadata), then save.
  (pdf/apply-redactions ed)
  (java.nio.file.Files/write
    (java.nio.file.Path/of "redacted.pdf" (into-array String []))
    (pdf/editor-save ed)))

PHP

use PdfOxide\DocumentEditor;

$editor = DocumentEditor::open('confidential.pdf');

// Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
$editor->addRedaction(0, 100.0, 700.0, 300.0, 714.0);

// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
$glyphs = $editor->applyRedactionsDestructive(true);
echo "glyphs removed: $glyphs\n";

$editor->saveTo('redacted.pdf');

Ruby

require 'pdf_oxide'

PdfOxide::DocumentEditor.open('confidential.pdf') do |ed|
  # Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
  ed.add_redaction(page: 0, rect: [100.0, 700.0, 300.0, 714.0])

  # Apply destructively, scrubbing metadata (raises on fail-closed), then save.
  ed.apply_redactions!(scrub_metadata: true)
  ed.save_to('redacted.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto editor = pdf_oxide::DocumentEditor::open("confidential.pdf");

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redaction_add(0, 100.0, 700.0, 300.0, 714.0, 0.0, 0.0, 0.0);

// Apply destructively (scrub_metadata = true). Returns glyphs removed.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";

editor.save("redacted.pdf");

Swift

import PdfOxide

let editor = try DocumentEditor.open("confidential.pdf")

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, x1: 100, y1: 700, x2: 300, y2: 714, r: 0, g: 0, b: 0)

// Apply destructively (scrub metadata). Returns glyphs removed.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")

try editor.save("redacted.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final editor = DocumentEditor.open('confidential.pdf');

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redactionAdd(0, 100, 700, 300, 714);

// Apply destructively, scrubbing metadata. Returns glyphs removed.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');

editor.save('redacted.pdf');

library(pdfoxide)

editor <- pdf_editor_open("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
pdf_redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)

# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")

pdf_editor_save(editor, "redacted.pdf")

Julia

using PdfOxide

editor = open_editor("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)

# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)

save(editor, "redacted.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");

var editor = try pdf_oxide.DocumentEditor.openEditor("confidential.pdf");
defer editor.deinit();

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, 100, 700, 300, 714, 0, 0, 0);

// Apply destructively, scrubbing metadata. Returns glyphs removed.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});

try editor.save("redacted.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"confidential.pdf" error:&err];

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
[editor redactionAddPage:0 x1:100 y1:700 x2:300 y2:714 r:0 g:0 b:0 error:&err];

// Apply destructively, scrubbing metadata. Returns glyphs removed.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);

[editor saveToPath:@"redacted.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
:ok = PdfOxide.redaction_add(editor, 0, 100, 700, 300, 714, 0.0, 0.0, 0.0)

# Apply destructively, scrubbing metadata. Returns {:ok, glyphs_removed}.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")

PdfOxide.editor_save(editor, "redacted.pdf")

统计排队的区域数

redaction_count(page) 返回某页面已排队的区域数量——源文档中的 /Redact 注释加上通过 redaction_add 程序性添加的矩形之和。在应用前用于断言确实有内容需要涂黑。

let mut editor = DocumentEditor::open("marked.pdf")?;
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
assert_eq!(editor.redaction_count(0)?, 1);

doc = PdfDocument("marked.pdf")
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))
assert doc.redaction_count(0) == 1

editor, _ := pdfoxide.OpenEditor("marked.pdf")
defer editor.Close()

_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)
n, _ := editor.RedactionCount(0)  // 1

using var editor = DocumentEditor.Open("marked.pdf");
editor.AddRedaction(0, 100, 700, 300, 714);
int n = editor.RedactionCount(0);  // 1

const doc = new WasmPdfDocument(bytes);
doc.addRedaction(0, 100, 700, 300, 714);
const n = doc.redactionCount(0);  // 1

涂黑与净化有什么区别？

几何涂黑删除矩形区域下的内容。净化则清除涂黑框永远无法覆盖的文档级机密：/Info 字典、目录 XMP /Metadata 流、文档 JavaScript（/OpenAction、/AA、/Names/JavaScript）以及 /Names/EmbeddedFiles。被删除的对象子树从输出中被硬排除（G6）。

当 scrub_metadata 标志被设置时（每个绑定的默认值），redaction_apply 会自动运行净化。redaction_scrub_metadata 独立运行相同的净化过程——不执行任何几何涂黑，仅在需要清洗文档时使用。它返回删除的顶级结构数量。

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("input.pdf")?;
let report = editor.sanitize_document(RedactionOptions::default())?;
println!("constructs removed: {}", report.annotations_removed);
editor.save("sanitized.pdf")?;

Python

doc = PdfDocument("input.pdf")
report = doc.sanitize_document()
print("constructs removed:", report["annotations_removed"])
doc.save("sanitized.pdf")

editor, _ := pdfoxide.OpenEditor("input.pdf")
defer editor.Close()

removed, _ := editor.SanitizeDocument()  // top-level constructs removed
_ = editor.Save("sanitized.pdf")

using var editor = DocumentEditor.Open("input.pdf");
int removed = editor.SanitizeDocument();  // top-level constructs removed
editor.Save("sanitized.pdf");

JavaScript (WASM)

const doc = new WasmPdfDocument(bytes);
const report = doc.sanitizeDocument(true, true, true);  // scrub, removeJS, removeEmbedded
const output = doc.save();
doc.free();

Java

import fyi.oxide.pdf.DocumentEditor;

try (DocumentEditor editor = DocumentEditor.open("input.pdf")) {
    editor.scrubMetadata();  // strip /Info, XMP, JS, embedded files — no geometry
    editor.saveTo(java.nio.file.Path.of("sanitized.pdf"));
}

Kotlin

import fyi.oxide.pdf.DocumentEditor

DocumentEditor.open("input.pdf").use { editor ->
    editor.scrubMetadata()  // strip /Info, XMP, JS, embedded files — no geometry
    editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}

Scala

import fyi.oxide.pdf.DocumentEditor
import scala.util.Using

Using.resource(DocumentEditor.open("input.pdf")) { editor =>
  editor.scrubMetadata()  // strip /Info, XMP, JS, embedded files — no geometry
  editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [ed (pdf/editor "input.pdf")]
  (pdf/scrub-metadata ed)  ; strip /Info, XMP, JS, embedded files — no geometry
  (java.nio.file.Files/write
    (java.nio.file.Path/of "sanitized.pdf" (into-array String []))
    (pdf/editor-save ed)))

PHP

use PdfOxide\DocumentEditor;

$editor = DocumentEditor::open('input.pdf');
$editor->scrubMetadata();  // strip /Info, XMP, JS, embedded files — no geometry
$editor->saveTo('sanitized.pdf');

Ruby

require 'pdf_oxide'

PdfOxide::DocumentEditor.open('input.pdf') do |ed|
  removed = ed.scrub_metadata  # top-level constructs removed (no geometry)
  puts "constructs removed: #{removed}"
  ed.save_to('sanitized.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto editor = pdf_oxide::DocumentEditor::open("input.pdf");
int removed = editor.redaction_scrub_metadata();  // constructs removed (no geometry)
std::cout << "constructs removed: " << removed << "\n";
editor.save("sanitized.pdf");

Swift

import PdfOxide

let editor = try DocumentEditor.open("input.pdf")
let removed = try editor.redactionScrubMetadata()  // constructs removed (no geometry)
print("constructs removed: \(removed)")
try editor.save("sanitized.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final editor = DocumentEditor.open('input.pdf');
final removed = editor.redactionScrubMetadata();  // constructs removed (no geometry)
print('constructs removed: $removed');
editor.save('sanitized.pdf');

library(pdfoxide)

editor <- pdf_editor_open("input.pdf")
removed <- pdf_redaction_scrub_metadata(editor)  # constructs removed (no geometry)
cat("constructs removed:", removed, "\n")
pdf_editor_save(editor, "sanitized.pdf")

Julia

using PdfOxide

editor = open_editor("input.pdf")
removed = redaction_scrub_metadata(editor)  # constructs removed (no geometry)
println("constructs removed: ", removed)
save(editor, "sanitized.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");

var editor = try pdf_oxide.DocumentEditor.openEditor("input.pdf");
defer editor.deinit();

const removed = try editor.redactionScrubMetadata();  // constructs removed (no geometry)
std.debug.print("constructs removed: {d}\n", .{removed});

try editor.save("sanitized.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"input.pdf" error:&err];
int32_t removed = [editor redactionScrubMetadataWithError:&err];  // constructs removed
NSLog(@"constructs removed: %d", removed);
[editor saveToPath:@"sanitized.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("input.pdf")
{:ok, removed} = PdfOxide.redaction_scrub_metadata(editor)  # constructs removed
IO.puts("constructs removed: #{removed}")
PdfOxide.editor_save(editor, "sanitized.pdf")

如何删除重复的页眉和页脚？

页眉、页脚和页面装饰工件会跨页重复出现，在提取或重新发布前通常需要将其删除。PDF Oxide 通过两种方式检测它们：优先使用 ISO 32000 规范兼容的 /Artifact 标签（存在时准确率 100%），不存在时回退到启发式方法，将在页面顶部或底部 15% 区域重复出现的文本标记为装饰。

remove_headers(threshold) / remove_footers(threshold) / remove_artifacts(threshold) 在整个文档范围内操作，返回已删除的项目数。threshold 是启发式模式下文本必须在多少比例的页面（0.0–1.0）中重复才被视为装饰（默认 0.8）。remove_artifacts 是同时删除页眉和页脚的便捷方法。

Rust

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

let headers = doc.remove_headers(0.8)?;   // count removed
let footers = doc.remove_footers(0.8)?;
// Or both at once:
let total = doc.remove_artifacts(0.8)?;   // headers + footers
println!("removed {} furniture items", total);

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

doc.remove_headers(0.8)        # threshold defaults to 0.8
doc.remove_footers(0.8)
total = doc.remove_artifacts(0.8)   # headers + footers, returns count
print("removed", total, "furniture items")

doc.save("clean.pdf")

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()

headers, _ := doc.RemoveHeaders(0.8)   // count removed
footers, _ := doc.RemoveFooters(0.8)
total, _ := doc.RemoveArtifacts(0.8)   // headers + footers
fmt.Printf("removed %d furniture items\n", total)

JavaScript (WASM)

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);

doc.removeHeaders(0.8);   // returns count removed
doc.removeFooters(0.8);
const total = doc.removeArtifacts(0.8);  // headers + footers

const output = doc.save();
doc.free();

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");

int headers = doc.remove_headers(0.8f);   // count removed
int footers = doc.remove_footers(0.8f);
int total   = doc.remove_artifacts(0.8f); // headers + footers
std::cout << "removed " << total << " furniture items\n";

Swift

import PdfOxide

let doc = try Document.open("report.pdf")

let headers = try doc.removeHeaders(threshold: 0.8)   // count removed
let footers = try doc.removeFooters(threshold: 0.8)
let total   = try doc.removeArtifacts(threshold: 0.8) // headers + footers
print("removed \(total) furniture items")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');

doc.removeHeaders(0.8);   // returns count removed
doc.removeFooters(0.8);
final total = doc.removeArtifacts(0.8);  // headers + footers
print('removed $total furniture items');

library(pdfoxide)

doc <- pdf_open("report.pdf")

headers <- pdf_remove_headers(doc, 0.8)   # count removed
footers <- pdf_remove_footers(doc, 0.8)
total   <- pdf_remove_artifacts(doc, 0.8) # headers + footers
cat("removed", total, "furniture items\n")

Julia

using PdfOxide

doc = open_document("report.pdf")

headers = remove_headers(doc, 0.8)   # count removed
footers = remove_footers(doc, 0.8)
total   = remove_artifacts(doc, 0.8) # headers + footers
println("removed ", total, " furniture items")

Zig

const pdf_oxide = @import("pdf_oxide");

var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();

const headers = try doc.removeHeaders(0.8);   // count removed
const footers = try doc.removeFooters(0.8);
const total   = try doc.removeArtifacts(0.8); // headers + footers
std.debug.print("removed {d} furniture items\n", .{total});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];

int32_t headers = [doc removeHeaders:0.8 error:&err];   // count removed
int32_t footers = [doc removeFooters:0.8 error:&err];
int32_t total   = [doc removeArtifacts:0.8 error:&err]; // headers + footers
NSLog(@"removed %d furniture items", total);

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")

{:ok, headers} = PdfOxide.remove_headers(doc, 0.8)   # count removed
{:ok, footers} = PdfOxide.remove_footers(doc, 0.8)
{:ok, total}   = PdfOxide.remove_artifacts(doc, 0.8) # headers + footers
IO.puts("removed #{total} furniture items")

擦除单页上的装饰

当您明确知道装饰出现在哪个页面时，可以使用按页面的 erase_* 系列，无需跨页重复性分析，直接将页眉区域（顶部 15%）、页脚区域（底部 15%）或两者一起标记为擦除。这些方法接受从零开始的页面索引。

Rust

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

doc.erase_header(0)?;     // erase the top 15% of page 0
doc.erase_footer(0)?;     // erase the bottom 15% of page 0
doc.erase_artifacts(0)?;  // erase both header and footer of page 0

Python

doc = PdfDocument("report.pdf")

doc.erase_header(0)      # erase the top 15% of page 0
doc.erase_footer(0)      # erase the bottom 15% of page 0
doc.erase_artifacts(0)   # erase both header and footer of page 0

doc.save("clean.pdf")

JavaScript (WASM)

const doc = new WasmPdfDocument(bytes);

doc.eraseHeader(0);     // erase the top 15% of page 0
doc.eraseFooter(0);     // erase the bottom 15% of page 0
doc.eraseArtifacts(0);  // erase both header and footer of page 0

const output = doc.save();
doc.free();

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");

doc.erase_header(0);     // erase the top 15% of page 0
doc.erase_footer(0);     // erase the bottom 15% of page 0
doc.erase_artifacts(0);  // erase both header and footer of page 0

Swift

import PdfOxide

let doc = try Document.open("report.pdf")

try doc.eraseHeader(0)     // erase the top 15% of page 0
try doc.eraseFooter(0)     // erase the bottom 15% of page 0
try doc.eraseArtifacts(0)  // erase both header and footer of page 0

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');

doc.eraseHeader(0);     // erase the top 15% of page 0
doc.eraseFooter(0);     // erase the bottom 15% of page 0
doc.eraseArtifacts(0);  // erase both header and footer of page 0

library(pdfoxide)

doc <- pdf_open("report.pdf")

pdf_erase_header(doc, 0)     # erase the top 15% of page 0
pdf_erase_footer(doc, 0)     # erase the bottom 15% of page 0
pdf_erase_artifacts(doc, 0)  # erase both header and footer of page 0

Julia

using PdfOxide

doc = open_document("report.pdf")

erase_header(doc, 0)     # erase the top 15% of page 0
erase_footer(doc, 0)     # erase the bottom 15% of page 0
erase_artifacts(doc, 0)  # erase both header and footer of page 0

Zig

const pdf_oxide = @import("pdf_oxide");

var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();

_ = try doc.eraseHeader(0);     // erase the top 15% of page 0
_ = try doc.eraseFooter(0);     // erase the bottom 15% of page 0
_ = try doc.eraseArtifacts(0);  // erase both header and footer of page 0

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];

[doc eraseHeader:0 error:&err];     // erase the top 15% of page 0
[doc eraseFooter:0 error:&err];     // erase the bottom 15% of page 0
[doc eraseArtifacts:0 error:&err];  // erase both header and footer of page 0

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")

{:ok, _} = PdfOxide.erase_header(doc, 0)     # erase the top 15% of page 0
{:ok, _} = PdfOxide.erase_footer(doc, 0)     # erase the bottom 15% of page 0
{:ok, _} = PdfOxide.erase_artifacts(doc, 0)  # erase both header and footer of page 0

完整涂黑工作流

本示例查找敏感区域，排队破坏性涂黑，应用并清除元数据，最后写出干净的文件。

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a destructive box for each match.
for i in range(doc.page_count()):
    page = doc.page(i)
    for t in page.find_text_containing("SSN"):
        x, y, w, h = t.bbox            # (x, y, width, height)
        doc.add_redaction(i, (x, y, x + w, y + h))

# Step 2: apply destructively + scrub metadata (the default).
report = doc.apply_redactions_destructive(scrub_metadata=True)
print("regions:", report["regions"], "glyphs removed:", report["glyphs_removed"])

# Step 3: save the rewritten document.
doc.save("report-redacted.pdf")

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("sensitive-report.pdf")?;

// Step 1: queue destructive boxes (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
editor.add_redaction(0, [100.0, 680.0, 300.0, 694.0], None)?;

// Step 2: apply destructively with safe defaults (scrubs metadata).
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!(
    "regions: {}, glyphs removed: {}",
    report.regions, report.glyphs_removed
);

// Step 3: save.
editor.save("report-redacted.pdf")?;

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.search.SearchMatch;

try (PdfDocument doc = PdfDocument.open("sensitive-report.pdf");
     DocumentEditor editor = DocumentEditor.open("sensitive-report.pdf")) {
    // Step 1: locate sensitive text and queue a destructive box for each hit.
    for (SearchMatch m : doc.search("SSN")) {
        editor.addRedaction(m.pageIndex(), m.bbox());  // bbox is x0,y0,x1,y1
    }
    // Step 2: apply destructively + scrub metadata (the default).
    editor.applyRedactionsDestructive();
    // Step 3: save.
    editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"));
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open("sensitive-report.pdf").use { doc ->
    DocumentEditor.open("sensitive-report.pdf").use { editor ->
        // Step 1: locate sensitive text and queue a box for each hit.
        for (m in doc.search("SSN")) {
            editor.addRedaction(m.pageIndex(), m.bbox())  // bbox is x0,y0,x1,y1
        }
        // Step 2: apply destructively + scrub metadata (the default).
        editor.applyRedactionsDestructive()
        // Step 3: save.
        editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
    }
}

Scala

import fyi.oxide.pdf.{PdfDocument, DocumentEditor, searchSeq}
import scala.util.Using

Using.resource(PdfDocument.open("sensitive-report.pdf")) { doc =>
  Using.resource(DocumentEditor.open("sensitive-report.pdf")) { editor =>
    // Step 1: locate sensitive text and queue a box for each hit.
    for (m <- doc.searchSeq("SSN"))
      editor.addRedaction(m.pageIndex, m.bbox)  // bbox is x0,y0,x1,y1
    // Step 2: apply destructively + scrub metadata (the default).
    editor.applyRedactionsDestructive()
    // Step 3: save.
    editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [doc (pdf/open "sensitive-report.pdf")
            ed  (pdf/editor "sensitive-report.pdf")]
  ;; Step 1: locate sensitive text and queue a box for each hit.
  (doseq [m (pdf/search doc "SSN")]
    (pdf/add-redaction ed (.pageIndex m) (.bbox m)))  ; bbox is x0,y0,x1,y1
  ;; Step 2: apply destructively + scrub metadata (the default).
  (pdf/apply-redactions ed)
  ;; Step 3: save.
  (java.nio.file.Files/write
    (java.nio.file.Path/of "report-redacted.pdf" (into-array String []))
    (pdf/editor-save ed)))

Ruby

require 'pdf_oxide'

doc = PdfOxide::PdfDocument.open('sensitive-report.pdf')
PdfOxide::DocumentEditor.open('sensitive-report.pdf') do |ed|
  # Step 1: locate sensitive text and queue a box for each hit.
  doc.search('SSN').each do |m|
    b = m[:bbox]  # { x:, y:, width:, height: }
    ed.add_redaction(page: m[:page], rect: [b[:x], b[:y], b[:x] + b[:width], b[:y] + b[:height]])
  end
  # Step 2: apply destructively + scrub metadata (raises on fail-closed).
  ed.apply_redactions!(scrub_metadata: true)
  # Step 3: save.
  ed.save_to('report-redacted.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc    = pdf_oxide::Document::open("sensitive-report.pdf");
auto editor = pdf_oxide::DocumentEditor::open("sensitive-report.pdf");

// Step 1: locate sensitive text and queue a box for each hit.
for (const auto& m : doc.search_all("SSN", /*case_sensitive=*/false)) {
    editor.redaction_add(m.page, m.bbox.x, m.bbox.y,
                         m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height,
                         0.0, 0.0, 0.0);
}
// Step 2: apply destructively + scrub metadata.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";
// Step 3: save.
editor.save("report-redacted.pdf");

Swift

import PdfOxide

let doc    = try Document.open("sensitive-report.pdf")
let editor = try DocumentEditor.open("sensitive-report.pdf")

// Step 1: locate sensitive text and queue a box for each hit.
for m in try doc.searchAll("SSN", false) {
    try editor.redactionAdd(m.page,
                            x1: m.bbox.x, y1: m.bbox.y,
                            x2: m.bbox.x + m.bbox.width, y2: m.bbox.y + m.bbox.height,
                            r: 0, g: 0, b: 0)
}
// Step 2: apply destructively + scrub metadata.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")
// Step 3: save.
try editor.save("report-redacted.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc    = PdfDocument.open('sensitive-report.pdf');
final editor = DocumentEditor.open('sensitive-report.pdf');

// Step 1: locate sensitive text and queue a box for each hit.
for (final m in doc.searchAll('SSN', false)) {
  editor.redactionAdd(m.page, m.bbox.x, m.bbox.y,
      m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height);
}
// Step 2: apply destructively + scrub metadata.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');
// Step 3: save.
editor.save('report-redacted.pdf');

library(pdfoxide)

doc    <- pdf_open("sensitive-report.pdf")
editor <- pdf_editor_open("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
for (m in pdf_search_all(doc, "SSN", FALSE)) {
  b <- m$bbox  # list(x=, y=, width=, height=)
  pdf_redaction_add(editor, m$page, b$x, b$y, b$x + b$width, b$y + b$height, 0, 0, 0)
}
# Step 2: apply destructively + scrub metadata.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")
# Step 3: save.
pdf_editor_save(editor, "report-redacted.pdf")

Julia

using PdfOxide

doc    = open_document("sensitive-report.pdf")
editor = open_editor("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
for m in search_all(doc, "SSN", false)
    b = m.bbox
    redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0, 0, 0)
end
# Step 2: apply destructively + scrub metadata.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)
# Step 3: save.
save(editor, "report-redacted.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc    = try pdf_oxide.Document.open("sensitive-report.pdf");
defer doc.deinit();
var editor = try pdf_oxide.DocumentEditor.openEditor("sensitive-report.pdf");
defer editor.deinit();

// Step 1: locate sensitive text and queue a box for each hit.
const hits = try doc.searchAll(a, "SSN", false);
for (hits) |m| {
    try editor.redactionAdd(@intCast(m.page), m.bbox.x, m.bbox.y,
        m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height, 0, 0, 0);
}
// Step 2: apply destructively + scrub metadata.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});
// Step 3: save.
try editor.save("report-redacted.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"sensitive-report.pdf" error:&err];
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"sensitive-report.pdf" error:&err];

// Step 1: locate sensitive text and queue a box for each hit.
for (POXSearchResult *m in [doc searchAll:@"SSN" caseSensitive:NO error:&err]) {
    POXBbox b = m.bbox;
    [editor redactionAddPage:m.page x1:b.x y1:b.y x2:b.x + b.width y2:b.y + b.height
                           r:0 g:0 b:0 error:&err];
}
// Step 2: apply destructively + scrub metadata.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);
// Step 3: save.
[editor saveToPath:@"report-redacted.pdf" error:&err];

Elixir

{:ok, doc}    = PdfOxide.open("sensitive-report.pdf")
{:ok, editor} = PdfOxide.open_editor("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
{:ok, hits} = PdfOxide.search_all(doc, "SSN", false)
Enum.each(hits, fn m ->
  b = m.bbox
  PdfOxide.redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0.0, 0.0, 0.0)
end)
# Step 2: apply destructively + scrub metadata.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")
# Step 3: save.
PdfOxide.editor_save(editor, "report-redacted.pdf")

方法参考

破坏性涂黑

标准名（C ABI）	Rust (`DocumentEditor`)	Python (`PdfDocument`)	Go (`DocumentEditor`)	C# (`DocumentEditor`)	JS (WASM)
`pdf_redaction_add`	`add_redaction(page, [x0,y0,x1,y1], Option<[r,g,b]>) -> Result<()>`	`add_redaction(page, rect, fill=None)`	`AddRedaction(page, [4]float64, *[3]float64) error`	`AddRedaction(pageIndex, x1, y1, x2, y2, r=0, g=0, b=0)`	`addRedaction(page, x0, y0, x1, y1, fill?)`
`pdf_redaction_count`	`redaction_count(page) -> Result<usize>`	`redaction_count(page) -> int`	`RedactionCount(page) (int, error)`	`RedactionCount(pageIndex) -> int`	`redactionCount(page) -> number`
`pdf_redaction_apply`	`apply_redactions_destructive(RedactionOptions) -> Result<RedactionReport>`	`apply_redactions_destructive(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True, fill=(0,0,0)) -> dict`	`ApplyRedactions(scrubMetadata bool) (int, error)`	`ApplyRedactions(scrubMetadata=true, r=0, g=0, b=0) -> int`	`applyRedactionsDestructive(scrubMetadata?) -> RedactionReport`
`pdf_redaction_scrub_metadata`	`sanitize_document(RedactionOptions) -> Result<RedactionReport>`	`sanitize_document(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True) -> dict`	`SanitizeDocument() (int, error)`	`SanitizeDocument() -> int`	`sanitizeDocument(scrub?, removeJS?, removeEmbedded?) -> RedactionReport`

Swift 封装在 DocumentEditor 上以 redactionAdd、redactionCount、redactionApply(scrubMetadata:r:g:b:) 和 redactionScrubMetadata() 的形式公开了同一套方法。

页眉/页脚/工件删除

标准名（C ABI）	Rust (`PdfDocument`)	Python (`PdfDocument`)	Go (`PdfDocument`)	JS (WASM)
`pdf_document_remove_headers`	`remove_headers(threshold: f32) -> Result<usize>`	`remove_headers(threshold=0.8) -> int`	`RemoveHeaders(threshold float32) (int, error)`	`removeHeaders(threshold) -> number`
`pdf_document_remove_footers`	`remove_footers(threshold: f32) -> Result<usize>`	`remove_footers(threshold=0.8) -> int`	`RemoveFooters(threshold float32) (int, error)`	`removeFooters(threshold) -> number`
`pdf_document_remove_artifacts`	`remove_artifacts(threshold: f32) -> Result<usize>`	`remove_artifacts(threshold=0.8) -> int`	`RemoveArtifacts(threshold float32) (int, error)`	`removeArtifacts(threshold) -> number`
`pdf_document_erase_header`	`erase_header(page: usize) -> Result<()>`	`erase_header(page) -> None`	—	`eraseHeader(page)`
`pdf_document_erase_footer`	`erase_footer(page: usize) -> Result<()>`	`erase_footer(page) -> None`	—	`eraseFooter(page)`
`pdf_document_erase_artifacts`	`erase_artifacts(page: usize) -> Result<()>`	`erase_artifacts(page) -> None`	—	`eraseArtifacts(page)`

Swift 在 Document 上公开了 removeHeaders(threshold:)、removeFooters(threshold:)、removeArtifacts(threshold:)、eraseHeader(_:)、eraseFooter(_:) 和 eraseArtifacts(_:)。Go 的 PdfDocument 公开了 Remove* 系列，但不包括按页面的 Erase* 系列。C# 目前不公开任何装饰系列。

RedactionReport

redaction_apply 和 redaction_scrub_metadata 返回一份报告，以便您断言操作确实完成了实际工作：

字段	含义
`regions`	已应用的区域数量。
`glyphs_removed`	从内容流中物理删除的字形数。
`images_modified` / `images_removed`	覆盖像素被覆写的图像数 / 被完全删除的图像数。
`paths_pruned`	被删除或几何裁剪的路径子路径数。
`annotations_removed`	被删除的顶级项目数（涂黑注释；或 `sanitize_document` 的净化根）。
`fonts_scrubbed`	`/Widths` / `/ToUnicode` 被清除的字体数。
`bytes_removed`	已删除字节数的尽力估算值。

重要说明

破坏性，非装饰性。 redaction_apply 按照 ISO 32000-1:2008 §12.5.6.23 从重写后的文件中物理删除覆盖内容——机密文本从文件中消失，而非仅被叠加覆盖。原始 /Contents 对象被硬删除，不会作为被 GC 遗漏的孤立对象残留。
失败关闭。 如果被涂黑页面使用复合/Type0 或其他无法重写的字体显示文本，redaction_apply 会返回错误，而非冒险进行静默漏涂。请处理该错误，不要假设操作成功。
安全默认值。 RedactionOptions::default() 会清除元数据、删除文档 JavaScript 和嵌入文件、剥离隐藏的可选内容层，并绘制不透明黑色叠加层（即使源 /Redact 注释未提供 /IC 颜色）。
不可逆。 保存后，删除操作是永久性的。请始终在原始文档的副本上操作。
性能。 涂黑和净化在同一个解析器上运行，该解析器实现了平均 0.8ms、100% 通过率的提取性能，因此涂黑过程的额外开销不会超出内容重写本身。

常见问题

PDF Oxide 的涂黑是真正的破坏性操作，还是仅在文字上放一个黑色方块？ 真正的破坏性操作。redaction_apply（apply_redactions_destructive）依照 ISO 32000-1:2008 §12.5.6.23，从内容流中删除覆盖的字形、图像和路径几何，并从输出中硬删除原始 /Contents 对象。黑色叠加层是在删除内容之后额外绘制的，而非作为替代。

如果页面使用了无法安全涂黑的复合字体，会发生什么？ 涂黑以失败关闭：redaction_apply 返回错误，而不是部分涂黑并留下可恢复的片段。请捕获错误，如果必须涂黑此类内容，可回退为光栅化该页面。

仅需清除元数据时，也需要涂黑区域吗？ 不需要。调用 redaction_scrub_metadata（sanitize_document）可进行独立处理，在不触碰页面几何的情况下清除 /Info、XMP /Metadata、文档 JavaScript 和嵌入文件。

页眉和页脚是如何被检测并删除的？ PDF Oxide 优先使用 ISO 32000 的 /Artifact 标签（存在时准确率 100%），退而使用启发式方法：将在至少 threshold 比例的页面（默认 0.8）的顶部或底部 15% 区域内重复出现的文本标记为装饰。

涂黑与净化

如何从 PDF 中涂黑文本？

排队涂黑矩形

统计排队的区域数

涂黑与净化有什么区别？

如何删除重复的页眉和页脚？

擦除单页上的装饰

完整涂黑工作流

方法参考

破坏性涂黑

页眉/页脚/工件删除

RedactionReport

重要说明

常见问题

相关页面