涂黑与净化
真正的涂黑是破坏性的:底层字形、图像和路径必须从内容流中被物理删除,而不仅仅是用黑色矩形覆盖。PDF Oxide v0.3.69 实现了 ISO 32000-1:2008 §12.5.6.23 涂黑规范——redaction_apply 返回物理删除的字形数量,当页面使用无法安全重写的复合/Type0 字体时,它会以失败关闭(拒绝操作而非悄悄地漏掉部分涂黑)。
本页介绍规范的破坏性涂黑系列(redaction_add / redaction_apply / redaction_count / redaction_scrub_metadata)以及页眉/页脚/工件删除系列(remove_headers / remove_footers / remove_artifacts,以及按页面的 erase_header / erase_footer / erase_artifacts)。
两种涂黑接口。 v0.3.69 仍保留了旧版注释扁平化路径(
apply_page_redactions/apply_all_redactions),该路径将/Redact注释烧录为视觉叠加层。当您需要内容被彻底删除时,请优先使用这里介绍的破坏性系列。破坏性的redaction_apply同样会处理源文档中已有的/Redact注释,因此两种接口可以组合使用。
绑定覆盖范围。 破坏性涂黑系列在 Rust、Python、Go、C# 和 WASM/JavaScript 构建中公开。页眉/页脚/工件的
remove_*系列在 Rust、Python、Go 和 WASM/JavaScript 中公开;按页面的erase_*系列在 Rust、Python 和 WASM/JavaScript 中公开。C# 目前尚未公开remove_*/erase_*页面装饰系列。
如何从 PDF 中涂黑文本?
破坏性工作流分三个步骤:
- 使用
redaction_add排队涂黑矩形(页面用户空间坐标;可选的 DeviceRGB 叠加填充色)。源文档中已有的/Redact注释会被自动纳入。 - 使用
redaction_apply应用——物理删除覆盖区域的字形/图像/路径,绘制不透明叠加层,可选地清除文档元数据,并返回删除的字形数量。 - 保存重写后的 PDF。原始
/Contents对象被硬删除(G6),因此秘密内容不会作为被 GC 遗漏的孤立对象残留。
排队涂黑矩形
redaction_add(page, rect, fill) 排队一个破坏性矩形。坐标为页面用户空间 (x0, y0, x1, y1);fill 是可选的 DeviceRGB [r, g, b] 叠加颜色(默认为黑色)。
Rust
use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;
let mut editor = DocumentEditor::open("confidential.pdf")?;
// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
// Apply destructively, then save. Returns a RedactionReport.
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!("glyphs removed: {}", report.glyphs_removed);
editor.save("redacted.pdf")?;
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("confidential.pdf")
# Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))
# Apply destructively (returns a report dict), then save.
report = doc.apply_redactions_destructive()
print("glyphs removed:", report["glyphs_removed"])
doc.save("redacted.pdf")
Go
editor, _ := pdfoxide.OpenEditor("confidential.pdf")
defer editor.Close()
// Queue a redaction box on page 0; pass nil fill for the default black.
_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)
// Apply destructively (scrubMetadata = true). Returns glyphs removed.
glyphs, _ := editor.ApplyRedactions(true)
fmt.Printf("glyphs removed: %d\n", glyphs)
_ = editor.Save("redacted.pdf")
C#
using PdfOxide;
using var editor = DocumentEditor.Open("confidential.pdf");
// Queue a redaction box on page 0 (x1, y1, x2, y2; r, g, b default to black).
editor.AddRedaction(0, 100, 700, 300, 714);
// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
int glyphs = editor.ApplyRedactions(scrubMetadata: true);
Console.WriteLine($"glyphs removed: {glyphs}");
editor.Save("redacted.pdf");
JavaScript (WASM)
import { WasmPdfDocument } from "pdf-oxide-wasm";
const doc = new WasmPdfDocument(bytes);
// Queue a redaction box on page 0; pass an [r, g, b] array for a custom fill.
doc.addRedaction(0, 100, 700, 300, 714);
// Apply destructively. Returns a RedactionReport object.
const report = doc.applyRedactionsDestructive(true);
console.log("glyphs removed:", report.glyphs_removed);
const output = doc.save();
doc.free();
Java
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.geometry.BBox;
try (DocumentEditor editor = DocumentEditor.open("confidential.pdf")) {
// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.addRedaction(0, new BBox(100.0, 700.0, 300.0, 714.0));
// Apply destructively (scrubs metadata), then save.
editor.applyRedactionsDestructive();
editor.saveTo(java.nio.file.Path.of("redacted.pdf"));
}
Kotlin
import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox
DocumentEditor.open("confidential.pdf").use { editor ->
// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))
// Apply destructively (scrubs metadata), then save.
editor.applyRedactionsDestructive()
editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}
Scala
import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox
import scala.util.Using
Using.resource(DocumentEditor.open("confidential.pdf")) { editor =>
// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))
// Apply destructively (scrubs metadata), then save.
editor.applyRedactionsDestructive()
editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}
Clojure
(require '[pdf-oxide.core :as pdf])
(import '[fyi.oxide.pdf.geometry BBox])
(with-open [ed (pdf/editor "confidential.pdf")]
;; Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
(pdf/add-redaction ed 0 (BBox. 100.0 700.0 300.0 714.0))
;; Apply destructively (scrubs metadata), then save.
(pdf/apply-redactions ed)
(java.nio.file.Files/write
(java.nio.file.Path/of "redacted.pdf" (into-array String []))
(pdf/editor-save ed)))
PHP
use PdfOxide\DocumentEditor;
$editor = DocumentEditor::open('confidential.pdf');
// Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
$editor->addRedaction(0, 100.0, 700.0, 300.0, 714.0);
// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
$glyphs = $editor->applyRedactionsDestructive(true);
echo "glyphs removed: $glyphs\n";
$editor->saveTo('redacted.pdf');
Ruby
require 'pdf_oxide'
PdfOxide::DocumentEditor.open('confidential.pdf') do |ed|
# Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
ed.add_redaction(page: 0, rect: [100.0, 700.0, 300.0, 714.0])
# Apply destructively, scrubbing metadata (raises on fail-closed), then save.
ed.apply_redactions!(scrub_metadata: true)
ed.save_to('redacted.pdf')
end
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto editor = pdf_oxide::DocumentEditor::open("confidential.pdf");
// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redaction_add(0, 100.0, 700.0, 300.0, 714.0, 0.0, 0.0, 0.0);
// Apply destructively (scrub_metadata = true). Returns glyphs removed.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";
editor.save("redacted.pdf");
Swift
import PdfOxide
let editor = try DocumentEditor.open("confidential.pdf")
// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, x1: 100, y1: 700, x2: 300, y2: 714, r: 0, g: 0, b: 0)
// Apply destructively (scrub metadata). Returns glyphs removed.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")
try editor.save("redacted.pdf")
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final editor = DocumentEditor.open('confidential.pdf');
// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redactionAdd(0, 100, 700, 300, 714);
// Apply destructively, scrubbing metadata. Returns glyphs removed.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');
editor.save('redacted.pdf');
R
library(pdfoxide)
editor <- pdf_editor_open("confidential.pdf")
# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
pdf_redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)
# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")
pdf_editor_save(editor, "redacted.pdf")
Julia
using PdfOxide
editor = open_editor("confidential.pdf")
# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)
# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)
save(editor, "redacted.pdf")
Zig
const pdf_oxide = @import("pdf_oxide");
var editor = try pdf_oxide.DocumentEditor.openEditor("confidential.pdf");
defer editor.deinit();
// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, 100, 700, 300, 714, 0, 0, 0);
// Apply destructively, scrubbing metadata. Returns glyphs removed.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});
try editor.save("redacted.pdf");
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"confidential.pdf" error:&err];
// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
[editor redactionAddPage:0 x1:100 y1:700 x2:300 y2:714 r:0 g:0 b:0 error:&err];
// Apply destructively, scrubbing metadata. Returns glyphs removed.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);
[editor saveToPath:@"redacted.pdf" error:&err];
Elixir
{:ok, editor} = PdfOxide.open_editor("confidential.pdf")
# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
:ok = PdfOxide.redaction_add(editor, 0, 100, 700, 300, 714, 0.0, 0.0, 0.0)
# Apply destructively, scrubbing metadata. Returns {:ok, glyphs_removed}.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")
PdfOxide.editor_save(editor, "redacted.pdf")
统计排队的区域数
redaction_count(page) 返回某页面已排队的区域数量——源文档中的 /Redact 注释加上通过 redaction_add 程序性添加的矩形之和。在应用前用于断言确实有内容需要涂黑。
let mut editor = DocumentEditor::open("marked.pdf")?;
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
assert_eq!(editor.redaction_count(0)?, 1);
doc = PdfDocument("marked.pdf")
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))
assert doc.redaction_count(0) == 1
editor, _ := pdfoxide.OpenEditor("marked.pdf")
defer editor.Close()
_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)
n, _ := editor.RedactionCount(0) // 1
using var editor = DocumentEditor.Open("marked.pdf");
editor.AddRedaction(0, 100, 700, 300, 714);
int n = editor.RedactionCount(0); // 1
const doc = new WasmPdfDocument(bytes);
doc.addRedaction(0, 100, 700, 300, 714);
const n = doc.redactionCount(0); // 1
涂黑与净化有什么区别?
几何涂黑删除矩形区域下的内容。净化则清除涂黑框永远无法覆盖的文档级机密:/Info 字典、目录 XMP /Metadata 流、文档 JavaScript(/OpenAction、/AA、/Names/JavaScript)以及 /Names/EmbeddedFiles。被删除的对象子树从输出中被硬排除(G6)。
当 scrub_metadata 标志被设置时(每个绑定的默认值),redaction_apply 会自动运行净化。redaction_scrub_metadata 独立运行相同的净化过程——不执行任何几何涂黑,仅在需要清洗文档时使用。它返回删除的顶级结构数量。
Rust
use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;
let mut editor = DocumentEditor::open("input.pdf")?;
let report = editor.sanitize_document(RedactionOptions::default())?;
println!("constructs removed: {}", report.annotations_removed);
editor.save("sanitized.pdf")?;
Python
doc = PdfDocument("input.pdf")
report = doc.sanitize_document()
print("constructs removed:", report["annotations_removed"])
doc.save("sanitized.pdf")
Go
editor, _ := pdfoxide.OpenEditor("input.pdf")
defer editor.Close()
removed, _ := editor.SanitizeDocument() // top-level constructs removed
_ = editor.Save("sanitized.pdf")
C#
using var editor = DocumentEditor.Open("input.pdf");
int removed = editor.SanitizeDocument(); // top-level constructs removed
editor.Save("sanitized.pdf");
JavaScript (WASM)
const doc = new WasmPdfDocument(bytes);
const report = doc.sanitizeDocument(true, true, true); // scrub, removeJS, removeEmbedded
const output = doc.save();
doc.free();
Java
import fyi.oxide.pdf.DocumentEditor;
try (DocumentEditor editor = DocumentEditor.open("input.pdf")) {
editor.scrubMetadata(); // strip /Info, XMP, JS, embedded files — no geometry
editor.saveTo(java.nio.file.Path.of("sanitized.pdf"));
}
Kotlin
import fyi.oxide.pdf.DocumentEditor
DocumentEditor.open("input.pdf").use { editor ->
editor.scrubMetadata() // strip /Info, XMP, JS, embedded files — no geometry
editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}
Scala
import fyi.oxide.pdf.DocumentEditor
import scala.util.Using
Using.resource(DocumentEditor.open("input.pdf")) { editor =>
editor.scrubMetadata() // strip /Info, XMP, JS, embedded files — no geometry
editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}
Clojure
(require '[pdf-oxide.core :as pdf])
(with-open [ed (pdf/editor "input.pdf")]
(pdf/scrub-metadata ed) ; strip /Info, XMP, JS, embedded files — no geometry
(java.nio.file.Files/write
(java.nio.file.Path/of "sanitized.pdf" (into-array String []))
(pdf/editor-save ed)))
PHP
use PdfOxide\DocumentEditor;
$editor = DocumentEditor::open('input.pdf');
$editor->scrubMetadata(); // strip /Info, XMP, JS, embedded files — no geometry
$editor->saveTo('sanitized.pdf');
Ruby
require 'pdf_oxide'
PdfOxide::DocumentEditor.open('input.pdf') do |ed|
removed = ed.scrub_metadata # top-level constructs removed (no geometry)
puts "constructs removed: #{removed}"
ed.save_to('sanitized.pdf')
end
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto editor = pdf_oxide::DocumentEditor::open("input.pdf");
int removed = editor.redaction_scrub_metadata(); // constructs removed (no geometry)
std::cout << "constructs removed: " << removed << "\n";
editor.save("sanitized.pdf");
Swift
import PdfOxide
let editor = try DocumentEditor.open("input.pdf")
let removed = try editor.redactionScrubMetadata() // constructs removed (no geometry)
print("constructs removed: \(removed)")
try editor.save("sanitized.pdf")
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final editor = DocumentEditor.open('input.pdf');
final removed = editor.redactionScrubMetadata(); // constructs removed (no geometry)
print('constructs removed: $removed');
editor.save('sanitized.pdf');
R
library(pdfoxide)
editor <- pdf_editor_open("input.pdf")
removed <- pdf_redaction_scrub_metadata(editor) # constructs removed (no geometry)
cat("constructs removed:", removed, "\n")
pdf_editor_save(editor, "sanitized.pdf")
Julia
using PdfOxide
editor = open_editor("input.pdf")
removed = redaction_scrub_metadata(editor) # constructs removed (no geometry)
println("constructs removed: ", removed)
save(editor, "sanitized.pdf")
Zig
const pdf_oxide = @import("pdf_oxide");
var editor = try pdf_oxide.DocumentEditor.openEditor("input.pdf");
defer editor.deinit();
const removed = try editor.redactionScrubMetadata(); // constructs removed (no geometry)
std.debug.print("constructs removed: {d}\n", .{removed});
try editor.save("sanitized.pdf");
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"input.pdf" error:&err];
int32_t removed = [editor redactionScrubMetadataWithError:&err]; // constructs removed
NSLog(@"constructs removed: %d", removed);
[editor saveToPath:@"sanitized.pdf" error:&err];
Elixir
{:ok, editor} = PdfOxide.open_editor("input.pdf")
{:ok, removed} = PdfOxide.redaction_scrub_metadata(editor) # constructs removed
IO.puts("constructs removed: #{removed}")
PdfOxide.editor_save(editor, "sanitized.pdf")
如何删除重复的页眉和页脚?
页眉、页脚和页面装饰工件会跨页重复出现,在提取或重新发布前通常需要将其删除。PDF Oxide 通过两种方式检测它们:优先使用 ISO 32000 规范兼容的 /Artifact 标签(存在时准确率 100%),不存在时回退到启发式方法,将在页面顶部或底部 15% 区域重复出现的文本标记为装饰。
remove_headers(threshold) / remove_footers(threshold) / remove_artifacts(threshold) 在整个文档范围内操作,返回已删除的项目数。threshold 是启发式模式下文本必须在多少比例的页面(0.0–1.0)中重复才被视为装饰(默认 0.8)。remove_artifacts 是同时删除页眉和页脚的便捷方法。
Rust
use pdf_oxide::PdfDocument;
let doc = PdfDocument::open("report.pdf")?;
let headers = doc.remove_headers(0.8)?; // count removed
let footers = doc.remove_footers(0.8)?;
// Or both at once:
let total = doc.remove_artifacts(0.8)?; // headers + footers
println!("removed {} furniture items", total);
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
doc.remove_headers(0.8) # threshold defaults to 0.8
doc.remove_footers(0.8)
total = doc.remove_artifacts(0.8) # headers + footers, returns count
print("removed", total, "furniture items")
doc.save("clean.pdf")
Go
doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
headers, _ := doc.RemoveHeaders(0.8) // count removed
footers, _ := doc.RemoveFooters(0.8)
total, _ := doc.RemoveArtifacts(0.8) // headers + footers
fmt.Printf("removed %d furniture items\n", total)
JavaScript (WASM)
import { WasmPdfDocument } from "pdf-oxide-wasm";
const doc = new WasmPdfDocument(bytes);
doc.removeHeaders(0.8); // returns count removed
doc.removeFooters(0.8);
const total = doc.removeArtifacts(0.8); // headers + footers
const output = doc.save();
doc.free();
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto doc = pdf_oxide::Document::open("report.pdf");
int headers = doc.remove_headers(0.8f); // count removed
int footers = doc.remove_footers(0.8f);
int total = doc.remove_artifacts(0.8f); // headers + footers
std::cout << "removed " << total << " furniture items\n";
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
let headers = try doc.removeHeaders(threshold: 0.8) // count removed
let footers = try doc.removeFooters(threshold: 0.8)
let total = try doc.removeArtifacts(threshold: 0.8) // headers + footers
print("removed \(total) furniture items")
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final doc = PdfDocument.open('report.pdf');
doc.removeHeaders(0.8); // returns count removed
doc.removeFooters(0.8);
final total = doc.removeArtifacts(0.8); // headers + footers
print('removed $total furniture items');
R
library(pdfoxide)
doc <- pdf_open("report.pdf")
headers <- pdf_remove_headers(doc, 0.8) # count removed
footers <- pdf_remove_footers(doc, 0.8)
total <- pdf_remove_artifacts(doc, 0.8) # headers + footers
cat("removed", total, "furniture items\n")
Julia
using PdfOxide
doc = open_document("report.pdf")
headers = remove_headers(doc, 0.8) # count removed
footers = remove_footers(doc, 0.8)
total = remove_artifacts(doc, 0.8) # headers + footers
println("removed ", total, " furniture items")
Zig
const pdf_oxide = @import("pdf_oxide");
var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();
const headers = try doc.removeHeaders(0.8); // count removed
const footers = try doc.removeFooters(0.8);
const total = try doc.removeArtifacts(0.8); // headers + footers
std.debug.print("removed {d} furniture items\n", .{total});
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
int32_t headers = [doc removeHeaders:0.8 error:&err]; // count removed
int32_t footers = [doc removeFooters:0.8 error:&err];
int32_t total = [doc removeArtifacts:0.8 error:&err]; // headers + footers
NSLog(@"removed %d furniture items", total);
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, headers} = PdfOxide.remove_headers(doc, 0.8) # count removed
{:ok, footers} = PdfOxide.remove_footers(doc, 0.8)
{:ok, total} = PdfOxide.remove_artifacts(doc, 0.8) # headers + footers
IO.puts("removed #{total} furniture items")
擦除单页上的装饰
当您明确知道装饰出现在哪个页面时,可以使用按页面的 erase_* 系列,无需跨页重复性分析,直接将页眉区域(顶部 15%)、页脚区域(底部 15%)或两者一起标记为擦除。这些方法接受从零开始的页面索引。
Rust
use pdf_oxide::PdfDocument;
let doc = PdfDocument::open("report.pdf")?;
doc.erase_header(0)?; // erase the top 15% of page 0
doc.erase_footer(0)?; // erase the bottom 15% of page 0
doc.erase_artifacts(0)?; // erase both header and footer of page 0
Python
doc = PdfDocument("report.pdf")
doc.erase_header(0) # erase the top 15% of page 0
doc.erase_footer(0) # erase the bottom 15% of page 0
doc.erase_artifacts(0) # erase both header and footer of page 0
doc.save("clean.pdf")
JavaScript (WASM)
const doc = new WasmPdfDocument(bytes);
doc.eraseHeader(0); // erase the top 15% of page 0
doc.eraseFooter(0); // erase the bottom 15% of page 0
doc.eraseArtifacts(0); // erase both header and footer of page 0
const output = doc.save();
doc.free();
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto doc = pdf_oxide::Document::open("report.pdf");
doc.erase_header(0); // erase the top 15% of page 0
doc.erase_footer(0); // erase the bottom 15% of page 0
doc.erase_artifacts(0); // erase both header and footer of page 0
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
try doc.eraseHeader(0) // erase the top 15% of page 0
try doc.eraseFooter(0) // erase the bottom 15% of page 0
try doc.eraseArtifacts(0) // erase both header and footer of page 0
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final doc = PdfDocument.open('report.pdf');
doc.eraseHeader(0); // erase the top 15% of page 0
doc.eraseFooter(0); // erase the bottom 15% of page 0
doc.eraseArtifacts(0); // erase both header and footer of page 0
R
library(pdfoxide)
doc <- pdf_open("report.pdf")
pdf_erase_header(doc, 0) # erase the top 15% of page 0
pdf_erase_footer(doc, 0) # erase the bottom 15% of page 0
pdf_erase_artifacts(doc, 0) # erase both header and footer of page 0
Julia
using PdfOxide
doc = open_document("report.pdf")
erase_header(doc, 0) # erase the top 15% of page 0
erase_footer(doc, 0) # erase the bottom 15% of page 0
erase_artifacts(doc, 0) # erase both header and footer of page 0
Zig
const pdf_oxide = @import("pdf_oxide");
var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();
_ = try doc.eraseHeader(0); // erase the top 15% of page 0
_ = try doc.eraseFooter(0); // erase the bottom 15% of page 0
_ = try doc.eraseArtifacts(0); // erase both header and footer of page 0
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
[doc eraseHeader:0 error:&err]; // erase the top 15% of page 0
[doc eraseFooter:0 error:&err]; // erase the bottom 15% of page 0
[doc eraseArtifacts:0 error:&err]; // erase both header and footer of page 0
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, _} = PdfOxide.erase_header(doc, 0) # erase the top 15% of page 0
{:ok, _} = PdfOxide.erase_footer(doc, 0) # erase the bottom 15% of page 0
{:ok, _} = PdfOxide.erase_artifacts(doc, 0) # erase both header and footer of page 0
完整涂黑工作流
本示例查找敏感区域,排队破坏性涂黑,应用并清除元数据,最后写出干净的文件。
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("sensitive-report.pdf")
# Step 1: locate sensitive text and queue a destructive box for each match.
for i in range(doc.page_count()):
page = doc.page(i)
for t in page.find_text_containing("SSN"):
x, y, w, h = t.bbox # (x, y, width, height)
doc.add_redaction(i, (x, y, x + w, y + h))
# Step 2: apply destructively + scrub metadata (the default).
report = doc.apply_redactions_destructive(scrub_metadata=True)
print("regions:", report["regions"], "glyphs removed:", report["glyphs_removed"])
# Step 3: save the rewritten document.
doc.save("report-redacted.pdf")
Rust
use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;
let mut editor = DocumentEditor::open("sensitive-report.pdf")?;
// Step 1: queue destructive boxes (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
editor.add_redaction(0, [100.0, 680.0, 300.0, 694.0], None)?;
// Step 2: apply destructively with safe defaults (scrubs metadata).
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!(
"regions: {}, glyphs removed: {}",
report.regions, report.glyphs_removed
);
// Step 3: save.
editor.save("report-redacted.pdf")?;
Java
import fyi.oxide.pdf.*;
import fyi.oxide.pdf.search.SearchMatch;
try (PdfDocument doc = PdfDocument.open("sensitive-report.pdf");
DocumentEditor editor = DocumentEditor.open("sensitive-report.pdf")) {
// Step 1: locate sensitive text and queue a destructive box for each hit.
for (SearchMatch m : doc.search("SSN")) {
editor.addRedaction(m.pageIndex(), m.bbox()); // bbox is x0,y0,x1,y1
}
// Step 2: apply destructively + scrub metadata (the default).
editor.applyRedactionsDestructive();
// Step 3: save.
editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"));
}
Kotlin
import fyi.oxide.pdf.*
PdfDocument.open("sensitive-report.pdf").use { doc ->
DocumentEditor.open("sensitive-report.pdf").use { editor ->
// Step 1: locate sensitive text and queue a box for each hit.
for (m in doc.search("SSN")) {
editor.addRedaction(m.pageIndex(), m.bbox()) // bbox is x0,y0,x1,y1
}
// Step 2: apply destructively + scrub metadata (the default).
editor.applyRedactionsDestructive()
// Step 3: save.
editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
}
}
Scala
import fyi.oxide.pdf.{PdfDocument, DocumentEditor, searchSeq}
import scala.util.Using
Using.resource(PdfDocument.open("sensitive-report.pdf")) { doc =>
Using.resource(DocumentEditor.open("sensitive-report.pdf")) { editor =>
// Step 1: locate sensitive text and queue a box for each hit.
for (m <- doc.searchSeq("SSN"))
editor.addRedaction(m.pageIndex, m.bbox) // bbox is x0,y0,x1,y1
// Step 2: apply destructively + scrub metadata (the default).
editor.applyRedactionsDestructive()
// Step 3: save.
editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
}
}
Clojure
(require '[pdf-oxide.core :as pdf])
(with-open [doc (pdf/open "sensitive-report.pdf")
ed (pdf/editor "sensitive-report.pdf")]
;; Step 1: locate sensitive text and queue a box for each hit.
(doseq [m (pdf/search doc "SSN")]
(pdf/add-redaction ed (.pageIndex m) (.bbox m))) ; bbox is x0,y0,x1,y1
;; Step 2: apply destructively + scrub metadata (the default).
(pdf/apply-redactions ed)
;; Step 3: save.
(java.nio.file.Files/write
(java.nio.file.Path/of "report-redacted.pdf" (into-array String []))
(pdf/editor-save ed)))
Ruby
require 'pdf_oxide'
doc = PdfOxide::PdfDocument.open('sensitive-report.pdf')
PdfOxide::DocumentEditor.open('sensitive-report.pdf') do |ed|
# Step 1: locate sensitive text and queue a box for each hit.
doc.search('SSN').each do |m|
b = m[:bbox] # { x:, y:, width:, height: }
ed.add_redaction(page: m[:page], rect: [b[:x], b[:y], b[:x] + b[:width], b[:y] + b[:height]])
end
# Step 2: apply destructively + scrub metadata (raises on fail-closed).
ed.apply_redactions!(scrub_metadata: true)
# Step 3: save.
ed.save_to('report-redacted.pdf')
end
C++
#include <pdf_oxide/pdf_oxide.hpp>
auto doc = pdf_oxide::Document::open("sensitive-report.pdf");
auto editor = pdf_oxide::DocumentEditor::open("sensitive-report.pdf");
// Step 1: locate sensitive text and queue a box for each hit.
for (const auto& m : doc.search_all("SSN", /*case_sensitive=*/false)) {
editor.redaction_add(m.page, m.bbox.x, m.bbox.y,
m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height,
0.0, 0.0, 0.0);
}
// Step 2: apply destructively + scrub metadata.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";
// Step 3: save.
editor.save("report-redacted.pdf");
Swift
import PdfOxide
let doc = try Document.open("sensitive-report.pdf")
let editor = try DocumentEditor.open("sensitive-report.pdf")
// Step 1: locate sensitive text and queue a box for each hit.
for m in try doc.searchAll("SSN", false) {
try editor.redactionAdd(m.page,
x1: m.bbox.x, y1: m.bbox.y,
x2: m.bbox.x + m.bbox.width, y2: m.bbox.y + m.bbox.height,
r: 0, g: 0, b: 0)
}
// Step 2: apply destructively + scrub metadata.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")
// Step 3: save.
try editor.save("report-redacted.pdf")
Dart
import 'package:pdf_oxide/pdf_oxide.dart';
final doc = PdfDocument.open('sensitive-report.pdf');
final editor = DocumentEditor.open('sensitive-report.pdf');
// Step 1: locate sensitive text and queue a box for each hit.
for (final m in doc.searchAll('SSN', false)) {
editor.redactionAdd(m.page, m.bbox.x, m.bbox.y,
m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height);
}
// Step 2: apply destructively + scrub metadata.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');
// Step 3: save.
editor.save('report-redacted.pdf');
R
library(pdfoxide)
doc <- pdf_open("sensitive-report.pdf")
editor <- pdf_editor_open("sensitive-report.pdf")
# Step 1: locate sensitive text and queue a box for each hit.
for (m in pdf_search_all(doc, "SSN", FALSE)) {
b <- m$bbox # list(x=, y=, width=, height=)
pdf_redaction_add(editor, m$page, b$x, b$y, b$x + b$width, b$y + b$height, 0, 0, 0)
}
# Step 2: apply destructively + scrub metadata.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")
# Step 3: save.
pdf_editor_save(editor, "report-redacted.pdf")
Julia
using PdfOxide
doc = open_document("sensitive-report.pdf")
editor = open_editor("sensitive-report.pdf")
# Step 1: locate sensitive text and queue a box for each hit.
for m in search_all(doc, "SSN", false)
b = m.bbox
redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0, 0, 0)
end
# Step 2: apply destructively + scrub metadata.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)
# Step 3: save.
save(editor, "report-redacted.pdf")
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
var doc = try pdf_oxide.Document.open("sensitive-report.pdf");
defer doc.deinit();
var editor = try pdf_oxide.DocumentEditor.openEditor("sensitive-report.pdf");
defer editor.deinit();
// Step 1: locate sensitive text and queue a box for each hit.
const hits = try doc.searchAll(a, "SSN", false);
for (hits) |m| {
try editor.redactionAdd(@intCast(m.page), m.bbox.x, m.bbox.y,
m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height, 0, 0, 0);
}
// Step 2: apply destructively + scrub metadata.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});
// Step 3: save.
try editor.save("report-redacted.pdf");
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocument *doc = [POXDocument openPath:@"sensitive-report.pdf" error:&err];
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"sensitive-report.pdf" error:&err];
// Step 1: locate sensitive text and queue a box for each hit.
for (POXSearchResult *m in [doc searchAll:@"SSN" caseSensitive:NO error:&err]) {
POXBbox b = m.bbox;
[editor redactionAddPage:m.page x1:b.x y1:b.y x2:b.x + b.width y2:b.y + b.height
r:0 g:0 b:0 error:&err];
}
// Step 2: apply destructively + scrub metadata.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);
// Step 3: save.
[editor saveToPath:@"report-redacted.pdf" error:&err];
Elixir
{:ok, doc} = PdfOxide.open("sensitive-report.pdf")
{:ok, editor} = PdfOxide.open_editor("sensitive-report.pdf")
# Step 1: locate sensitive text and queue a box for each hit.
{:ok, hits} = PdfOxide.search_all(doc, "SSN", false)
Enum.each(hits, fn m ->
b = m.bbox
PdfOxide.redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0.0, 0.0, 0.0)
end)
# Step 2: apply destructively + scrub metadata.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")
# Step 3: save.
PdfOxide.editor_save(editor, "report-redacted.pdf")
方法参考
破坏性涂黑
| 标准名(C ABI) | Rust (DocumentEditor) |
Python (PdfDocument) |
Go (DocumentEditor) |
C# (DocumentEditor) |
JS (WASM) |
|---|---|---|---|---|---|
pdf_redaction_add |
add_redaction(page, [x0,y0,x1,y1], Option<[r,g,b]>) -> Result<()> |
add_redaction(page, rect, fill=None) |
AddRedaction(page, [4]float64, *[3]float64) error |
AddRedaction(pageIndex, x1, y1, x2, y2, r=0, g=0, b=0) |
addRedaction(page, x0, y0, x1, y1, fill?) |
pdf_redaction_count |
redaction_count(page) -> Result<usize> |
redaction_count(page) -> int |
RedactionCount(page) (int, error) |
RedactionCount(pageIndex) -> int |
redactionCount(page) -> number |
pdf_redaction_apply |
apply_redactions_destructive(RedactionOptions) -> Result<RedactionReport> |
apply_redactions_destructive(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True, fill=(0,0,0)) -> dict |
ApplyRedactions(scrubMetadata bool) (int, error) |
ApplyRedactions(scrubMetadata=true, r=0, g=0, b=0) -> int |
applyRedactionsDestructive(scrubMetadata?) -> RedactionReport |
pdf_redaction_scrub_metadata |
sanitize_document(RedactionOptions) -> Result<RedactionReport> |
sanitize_document(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True) -> dict |
SanitizeDocument() (int, error) |
SanitizeDocument() -> int |
sanitizeDocument(scrub?, removeJS?, removeEmbedded?) -> RedactionReport |
Swift 封装在
DocumentEditor上以redactionAdd、redactionCount、redactionApply(scrubMetadata:r:g:b:)和redactionScrubMetadata()的形式公开了同一套方法。
页眉/页脚/工件删除
| 标准名(C ABI) | Rust (PdfDocument) |
Python (PdfDocument) |
Go (PdfDocument) |
JS (WASM) |
|---|---|---|---|---|
pdf_document_remove_headers |
remove_headers(threshold: f32) -> Result<usize> |
remove_headers(threshold=0.8) -> int |
RemoveHeaders(threshold float32) (int, error) |
removeHeaders(threshold) -> number |
pdf_document_remove_footers |
remove_footers(threshold: f32) -> Result<usize> |
remove_footers(threshold=0.8) -> int |
RemoveFooters(threshold float32) (int, error) |
removeFooters(threshold) -> number |
pdf_document_remove_artifacts |
remove_artifacts(threshold: f32) -> Result<usize> |
remove_artifacts(threshold=0.8) -> int |
RemoveArtifacts(threshold float32) (int, error) |
removeArtifacts(threshold) -> number |
pdf_document_erase_header |
erase_header(page: usize) -> Result<()> |
erase_header(page) -> None |
— | eraseHeader(page) |
pdf_document_erase_footer |
erase_footer(page: usize) -> Result<()> |
erase_footer(page) -> None |
— | eraseFooter(page) |
pdf_document_erase_artifacts |
erase_artifacts(page: usize) -> Result<()> |
erase_artifacts(page) -> None |
— | eraseArtifacts(page) |
Swift 在
Document上公开了removeHeaders(threshold:)、removeFooters(threshold:)、removeArtifacts(threshold:)、eraseHeader(_:)、eraseFooter(_:)和eraseArtifacts(_:)。Go 的PdfDocument公开了Remove*系列,但不包括按页面的Erase*系列。C# 目前不公开任何装饰系列。
RedactionReport
redaction_apply 和 redaction_scrub_metadata 返回一份报告,以便您断言操作确实完成了实际工作:
| 字段 | 含义 |
|---|---|
regions |
已应用的区域数量。 |
glyphs_removed |
从内容流中物理删除的字形数。 |
images_modified / images_removed |
覆盖像素被覆写的图像数 / 被完全删除的图像数。 |
paths_pruned |
被删除或几何裁剪的路径子路径数。 |
annotations_removed |
被删除的顶级项目数(涂黑注释;或 sanitize_document 的净化根)。 |
fonts_scrubbed |
/Widths / /ToUnicode 被清除的字体数。 |
bytes_removed |
已删除字节数的尽力估算值。 |
重要说明
- 破坏性,非装饰性。
redaction_apply按照 ISO 32000-1:2008 §12.5.6.23 从重写后的文件中物理删除覆盖内容——机密文本从文件中消失,而非仅被叠加覆盖。原始/Contents对象被硬删除,不会作为被 GC 遗漏的孤立对象残留。 - 失败关闭。 如果被涂黑页面使用复合/Type0 或其他无法重写的字体显示文本,
redaction_apply会返回错误,而非冒险进行静默漏涂。请处理该错误,不要假设操作成功。 - 安全默认值。
RedactionOptions::default()会清除元数据、删除文档 JavaScript 和嵌入文件、剥离隐藏的可选内容层,并绘制不透明黑色叠加层(即使源/Redact注释未提供/IC颜色)。 - 不可逆。 保存后,删除操作是永久性的。请始终在原始文档的副本上操作。
- 性能。 涂黑和净化在同一个解析器上运行,该解析器实现了平均 0.8ms、100% 通过率的提取性能,因此涂黑过程的额外开销不会超出内容重写本身。
常见问题
PDF Oxide 的涂黑是真正的破坏性操作,还是仅在文字上放一个黑色方块?
真正的破坏性操作。redaction_apply(apply_redactions_destructive)依照 ISO 32000-1:2008 §12.5.6.23,从内容流中删除覆盖的字形、图像和路径几何,并从输出中硬删除原始 /Contents 对象。黑色叠加层是在删除内容之后额外绘制的,而非作为替代。
如果页面使用了无法安全涂黑的复合字体,会发生什么?
涂黑以失败关闭:redaction_apply 返回错误,而不是部分涂黑并留下可恢复的片段。请捕获错误,如果必须涂黑此类内容,可回退为光栅化该页面。
仅需清除元数据时,也需要涂黑区域吗?
不需要。调用 redaction_scrub_metadata(sanitize_document)可进行独立处理,在不触碰页面几何的情况下清除 /Info、XMP /Metadata、文档 JavaScript 和嵌入文件。
页眉和页脚是如何被检测并删除的?
PDF Oxide 优先使用 ISO 32000 的 /Artifact 标签(存在时准确率 100%),退而使用启发式方法:将在至少 threshold 比例的页面(默认 0.8)的顶部或底部 15% 区域内重复出现的文本标记为装饰。
相关页面
- 注释编辑 — 处理注释
- 页面操作 — 作为替代方案的矩形内容擦除
- 文本编辑 — 查找需要涂黑的文本
- 加密与安全 — 涂黑后限制访问
- PDF/A 与合规性 — 净化后的归档输出