What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Schwärzung und Bereinigung

Echte Schwärzung ist destruktiv: Die zugrunde liegenden Glyphen, Bilder und Pfade müssen physisch aus dem Inhaltsstrom entfernt werden, nicht nur mit einem schwarzen Rechteck überdeckt werden. PDF Oxide v0.3.69 implementiert die Schwärzung nach ISO 32000-1:2008 §12.5.6.23 — redaction_apply gibt die Anzahl der physisch gelöschten Glyphen zurück und schlägt sicher fehl (verweigert statt stillschweigend unvollständig zu schwärzen), wenn eine Seite eine zusammengesetzte/Type0-Schriftart verwendet, die nicht sicher umgeschrieben werden kann.

Diese Seite behandelt die kanonische destruktive Schwärzungsfamilie (redaction_add / redaction_apply / redaction_count / redaction_scrub_metadata) sowie die Familie zur Entfernung von Kopf-/Fußzeilen und Artefakten (remove_headers / remove_footers / remove_artifacts sowie die seitenweisen Funktionen erase_header / erase_footer / erase_artifacts).

Zwei Schwärzungsschnittstellen. v0.3.69 bietet noch den veralteten Annotationsreduzierungspfad (apply_page_redactions / apply_all_redactions), der /Redact-Annotationen in eine kosmetische Überlagerung einbrennt. Bevorzugen Sie die hier dokumentierte destruktive Familie, wenn Inhalte wirklich verschwinden müssen. Das destruktive redaction_apply verarbeitet auch bereits im Quelldokument vorhandene /Redact-Annotationen, sodass sich beide Ansätze kombinieren lassen.

Binding-Abdeckung. Die destruktive Schwärzungsfamilie ist in Rust, Python, Go, C# und dem WASM/JavaScript-Build verfügbar. Die remove_*-Familie für Kopf-/Fußzeilen und Artefakte ist in Rust, Python, Go und WASM/JavaScript verfügbar; die seitenweise erase_*-Familie in Rust, Python und WASM/JavaScript. C# bietet die remove_* / erase_*-Seitenelement-Familie noch nicht an.

Wie schwärze ich Text aus einer PDF-Datei?

Der destruktive Arbeitsablauf umfasst drei Schritte:

Schwärzungsrechtecke einreihen mit redaction_add (Seitenbenutzerkoordinaten; optionale DeviceRGB-Überlagerungsfüllung). Im Quelldokument bereits vorhandene /Redact-Annotationen werden automatisch erfasst.
Anwenden mit redaction_apply — dies entfernt physisch abgedeckte Glyphen/Bilder/Pfade, zeichnet eine opake Überlagerung, bereinigt optional die Dokumentmetadaten und gibt die Anzahl entfernter Glyphen zurück.
Speichern der umgeschriebenen PDF-Datei. Die ursprünglichen /Contents-Objekte werden hart gelöscht (G6), sodass geheime Inhalte nicht als vom GC übersehene Waisen überleben können.

Schwärzungsrechteck einreihen

redaction_add(page, rect, fill) reiht ein destruktives Rechteck ein. Koordinaten sind im Seitenbenutzerraum (x0, y0, x1, y1); fill ist eine optionale DeviceRGB-Überlagerungsfarbe [r, g, b] (Standard: Schwarz).

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("confidential.pdf")?;

// Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;

// Apply destructively, then save. Returns a RedactionReport.
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!("glyphs removed: {}", report.glyphs_removed);

editor.save("redacted.pdf")?;

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("confidential.pdf")

# Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))

# Apply destructively (returns a report dict), then save.
report = doc.apply_redactions_destructive()
print("glyphs removed:", report["glyphs_removed"])

doc.save("redacted.pdf")

editor, _ := pdfoxide.OpenEditor("confidential.pdf")
defer editor.Close()

// Queue a redaction box on page 0; pass nil fill for the default black.
_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)

// Apply destructively (scrubMetadata = true). Returns glyphs removed.
glyphs, _ := editor.ApplyRedactions(true)
fmt.Printf("glyphs removed: %d\n", glyphs)

_ = editor.Save("redacted.pdf")

using PdfOxide;

using var editor = DocumentEditor.Open("confidential.pdf");

// Queue a redaction box on page 0 (x1, y1, x2, y2; r, g, b default to black).
editor.AddRedaction(0, 100, 700, 300, 714);

// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
int glyphs = editor.ApplyRedactions(scrubMetadata: true);
Console.WriteLine($"glyphs removed: {glyphs}");

editor.Save("redacted.pdf");

JavaScript (WASM)

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);

// Queue a redaction box on page 0; pass an [r, g, b] array for a custom fill.
doc.addRedaction(0, 100, 700, 300, 714);

// Apply destructively. Returns a RedactionReport object.
const report = doc.applyRedactionsDestructive(true);
console.log("glyphs removed:", report.glyphs_removed);

const output = doc.save();
doc.free();

Java

import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.geometry.BBox;

try (DocumentEditor editor = DocumentEditor.open("confidential.pdf")) {
    // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
    editor.addRedaction(0, new BBox(100.0, 700.0, 300.0, 714.0));

    // Apply destructively (scrubs metadata), then save.
    editor.applyRedactionsDestructive();
    editor.saveTo(java.nio.file.Path.of("redacted.pdf"));
}

Kotlin

import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox

DocumentEditor.open("confidential.pdf").use { editor ->
    // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
    editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))

    // Apply destructively (scrubs metadata), then save.
    editor.applyRedactionsDestructive()
    editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}

Scala

import fyi.oxide.pdf.DocumentEditor
import fyi.oxide.pdf.geometry.BBox
import scala.util.Using

Using.resource(DocumentEditor.open("confidential.pdf")) { editor =>
  // Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
  editor.addRedaction(0, BBox(100.0, 700.0, 300.0, 714.0))

  // Apply destructively (scrubs metadata), then save.
  editor.applyRedactionsDestructive()
  editor.saveTo(java.nio.file.Path.of("redacted.pdf"))
}

Clojure

(require '[pdf-oxide.core :as pdf])
(import '[fyi.oxide.pdf.geometry BBox])

(with-open [ed (pdf/editor "confidential.pdf")]
  ;; Queue a black redaction box on page 0 (x0, y0, x1, y1 in points).
  (pdf/add-redaction ed 0 (BBox. 100.0 700.0 300.0 714.0))
  ;; Apply destructively (scrubs metadata), then save.
  (pdf/apply-redactions ed)
  (java.nio.file.Files/write
    (java.nio.file.Path/of "redacted.pdf" (into-array String []))
    (pdf/editor-save ed)))

PHP

use PdfOxide\DocumentEditor;

$editor = DocumentEditor::open('confidential.pdf');

// Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
$editor->addRedaction(0, 100.0, 700.0, 300.0, 714.0);

// Apply destructively (scrubMetadata defaults to true). Returns glyphs removed.
$glyphs = $editor->applyRedactionsDestructive(true);
echo "glyphs removed: $glyphs\n";

$editor->saveTo('redacted.pdf');

Ruby

require 'pdf_oxide'

PdfOxide::DocumentEditor.open('confidential.pdf') do |ed|
  # Queue a black redaction box on page 0 (x1, y1, x2, y2 in points).
  ed.add_redaction(page: 0, rect: [100.0, 700.0, 300.0, 714.0])

  # Apply destructively, scrubbing metadata (raises on fail-closed), then save.
  ed.apply_redactions!(scrub_metadata: true)
  ed.save_to('redacted.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto editor = pdf_oxide::DocumentEditor::open("confidential.pdf");

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redaction_add(0, 100.0, 700.0, 300.0, 714.0, 0.0, 0.0, 0.0);

// Apply destructively (scrub_metadata = true). Returns glyphs removed.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";

editor.save("redacted.pdf");

Swift

import PdfOxide

let editor = try DocumentEditor.open("confidential.pdf")

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, x1: 100, y1: 700, x2: 300, y2: 714, r: 0, g: 0, b: 0)

// Apply destructively (scrub metadata). Returns glyphs removed.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")

try editor.save("redacted.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final editor = DocumentEditor.open('confidential.pdf');

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
editor.redactionAdd(0, 100, 700, 300, 714);

// Apply destructively, scrubbing metadata. Returns glyphs removed.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');

editor.save('redacted.pdf');

library(pdfoxide)

editor <- pdf_editor_open("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
pdf_redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)

# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")

pdf_editor_save(editor, "redacted.pdf")

Julia

using PdfOxide

editor = open_editor("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
redaction_add(editor, 0, 100, 700, 300, 714, 0, 0, 0)

# Apply destructively, scrubbing metadata. Returns glyphs removed.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)

save(editor, "redacted.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");

var editor = try pdf_oxide.DocumentEditor.openEditor("confidential.pdf");
defer editor.deinit();

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
try editor.redactionAdd(0, 100, 700, 300, 714, 0, 0, 0);

// Apply destructively, scrubbing metadata. Returns glyphs removed.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});

try editor.save("redacted.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"confidential.pdf" error:&err];

// Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
[editor redactionAddPage:0 x1:100 y1:700 x2:300 y2:714 r:0 g:0 b:0 error:&err];

// Apply destructively, scrubbing metadata. Returns glyphs removed.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);

[editor saveToPath:@"redacted.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("confidential.pdf")

# Queue a black redaction box on page 0 (x1, y1, x2, y2; r, g, b in 0..1).
:ok = PdfOxide.redaction_add(editor, 0, 100, 700, 300, 714, 0.0, 0.0, 0.0)

# Apply destructively, scrubbing metadata. Returns {:ok, glyphs_removed}.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")

PdfOxide.editor_save(editor, "redacted.pdf")

Eingereihte Bereiche zählen

redaction_count(page) gibt die Anzahl der für eine Seite eingereihten Bereiche zurück — /Redact-Annotationen im Quelldokument plus programmatisch über redaction_add hinzugefügte Rechtecke. Damit lässt sich vor dem Anwenden prüfen, ob überhaupt etwas zu schwärzen vorhanden ist.

let mut editor = DocumentEditor::open("marked.pdf")?;
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
assert_eq!(editor.redaction_count(0)?, 1);

doc = PdfDocument("marked.pdf")
doc.add_redaction(0, (100.0, 700.0, 300.0, 714.0))
assert doc.redaction_count(0) == 1

editor, _ := pdfoxide.OpenEditor("marked.pdf")
defer editor.Close()

_ = editor.AddRedaction(0, [4]float64{100, 700, 300, 714}, nil)
n, _ := editor.RedactionCount(0)  // 1

using var editor = DocumentEditor.Open("marked.pdf");
editor.AddRedaction(0, 100, 700, 300, 714);
int n = editor.RedactionCount(0);  // 1

const doc = new WasmPdfDocument(bytes);
doc.addRedaction(0, 100, 700, 300, 714);
const n = doc.redactionCount(0);  // 1

Was ist der Unterschied zwischen Schwärzung und Bereinigung?

Geometrische Schwärzung entfernt Inhalte unter einem Rechteck. Bereinigung entfernt Geheimnisse auf Dokumentebene, die Schwärzungsrechtecke niemals abdecken: das /Info-Wörterbuch, den Katalog-XMP-/Metadata-Stream, Dokument-JavaScript (/OpenAction, /AA, /Names/JavaScript) und /Names/EmbeddedFiles. Die entfernten Objektteilbäume werden hart aus der Ausgabe ausgeschlossen (G6).

redaction_apply führt die Bereinigung automatisch aus, wenn das scrub_metadata-Flag gesetzt ist (Standard in jedem Binding). redaction_scrub_metadata führt denselben Bereinigungsdurchlauf eigenständig aus, ohne geometrische Schwärzung — verwenden Sie es, wenn Sie ein Dokument nur bereinigen müssen, ohne Bereiche zu schwärzen. Es gibt die Anzahl entfernter Konstrukte der obersten Ebene zurück.

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("input.pdf")?;
let report = editor.sanitize_document(RedactionOptions::default())?;
println!("constructs removed: {}", report.annotations_removed);
editor.save("sanitized.pdf")?;

Python

doc = PdfDocument("input.pdf")
report = doc.sanitize_document()
print("constructs removed:", report["annotations_removed"])
doc.save("sanitized.pdf")

editor, _ := pdfoxide.OpenEditor("input.pdf")
defer editor.Close()

removed, _ := editor.SanitizeDocument()  // top-level constructs removed
_ = editor.Save("sanitized.pdf")

using var editor = DocumentEditor.Open("input.pdf");
int removed = editor.SanitizeDocument();  // top-level constructs removed
editor.Save("sanitized.pdf");

JavaScript (WASM)

const doc = new WasmPdfDocument(bytes);
const report = doc.sanitizeDocument(true, true, true);  // scrub, removeJS, removeEmbedded
const output = doc.save();
doc.free();

Java

import fyi.oxide.pdf.DocumentEditor;

try (DocumentEditor editor = DocumentEditor.open("input.pdf")) {
    editor.scrubMetadata();  // strip /Info, XMP, JS, embedded files — no geometry
    editor.saveTo(java.nio.file.Path.of("sanitized.pdf"));
}

Kotlin

import fyi.oxide.pdf.DocumentEditor

DocumentEditor.open("input.pdf").use { editor ->
    editor.scrubMetadata()  // strip /Info, XMP, JS, embedded files — no geometry
    editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}

Scala

import fyi.oxide.pdf.DocumentEditor
import scala.util.Using

Using.resource(DocumentEditor.open("input.pdf")) { editor =>
  editor.scrubMetadata()  // strip /Info, XMP, JS, embedded files — no geometry
  editor.saveTo(java.nio.file.Path.of("sanitized.pdf"))
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [ed (pdf/editor "input.pdf")]
  (pdf/scrub-metadata ed)  ; strip /Info, XMP, JS, embedded files — no geometry
  (java.nio.file.Files/write
    (java.nio.file.Path/of "sanitized.pdf" (into-array String []))
    (pdf/editor-save ed)))

PHP

use PdfOxide\DocumentEditor;

$editor = DocumentEditor::open('input.pdf');
$editor->scrubMetadata();  // strip /Info, XMP, JS, embedded files — no geometry
$editor->saveTo('sanitized.pdf');

Ruby

require 'pdf_oxide'

PdfOxide::DocumentEditor.open('input.pdf') do |ed|
  removed = ed.scrub_metadata  # top-level constructs removed (no geometry)
  puts "constructs removed: #{removed}"
  ed.save_to('sanitized.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto editor = pdf_oxide::DocumentEditor::open("input.pdf");
int removed = editor.redaction_scrub_metadata();  // constructs removed (no geometry)
std::cout << "constructs removed: " << removed << "\n";
editor.save("sanitized.pdf");

Swift

import PdfOxide

let editor = try DocumentEditor.open("input.pdf")
let removed = try editor.redactionScrubMetadata()  // constructs removed (no geometry)
print("constructs removed: \(removed)")
try editor.save("sanitized.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final editor = DocumentEditor.open('input.pdf');
final removed = editor.redactionScrubMetadata();  // constructs removed (no geometry)
print('constructs removed: $removed');
editor.save('sanitized.pdf');

library(pdfoxide)

editor <- pdf_editor_open("input.pdf")
removed <- pdf_redaction_scrub_metadata(editor)  # constructs removed (no geometry)
cat("constructs removed:", removed, "\n")
pdf_editor_save(editor, "sanitized.pdf")

Julia

using PdfOxide

editor = open_editor("input.pdf")
removed = redaction_scrub_metadata(editor)  # constructs removed (no geometry)
println("constructs removed: ", removed)
save(editor, "sanitized.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");

var editor = try pdf_oxide.DocumentEditor.openEditor("input.pdf");
defer editor.deinit();

const removed = try editor.redactionScrubMetadata();  // constructs removed (no geometry)
std.debug.print("constructs removed: {d}\n", .{removed});

try editor.save("sanitized.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"input.pdf" error:&err];
int32_t removed = [editor redactionScrubMetadataWithError:&err];  // constructs removed
NSLog(@"constructs removed: %d", removed);
[editor saveToPath:@"sanitized.pdf" error:&err];

Elixir

{:ok, editor} = PdfOxide.open_editor("input.pdf")
{:ok, removed} = PdfOxide.redaction_scrub_metadata(editor)  # constructs removed
IO.puts("constructs removed: #{removed}")
PdfOxide.editor_save(editor, "sanitized.pdf")

Wie entferne ich wiederkehrende Kopf- und Fußzeilen?

Kopfzeilen, Fußzeilen und Seitendekoration-Artefakte wiederholen sich über Seiten hinweg und sind oft Standardtext, den Sie vor der Extraktion oder Weiterveröffentlichung entfernen möchten. PDF Oxide erkennt sie auf zwei Wegen: Es priorisiert ISO-32000-konforme /Artifact-Tags (100% Genauigkeit wenn vorhanden) und fällt auf eine Heuristik zurück, die Text markiert, der in den oberen oder unteren 15% der Seiten wiederholt auftritt.

remove_headers(threshold) / remove_footers(threshold) / remove_artifacts(threshold) arbeiten dokumentweit und geben die Anzahl entfernter Elemente zurück. threshold ist der Anteil der Seiten (0.0–1.0), auf denen Text im heuristischen Modus wiederholt auftreten muss, um als Dekoration zu gelten (Standard 0.8). remove_artifacts ist die Komfortfunktion, die Kopf- und Fußzeilen in einem Schritt entfernt.

Rust

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

let headers = doc.remove_headers(0.8)?;   // count removed
let footers = doc.remove_footers(0.8)?;
// Or both at once:
let total = doc.remove_artifacts(0.8)?;   // headers + footers
println!("removed {} furniture items", total);

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

doc.remove_headers(0.8)        # threshold defaults to 0.8
doc.remove_footers(0.8)
total = doc.remove_artifacts(0.8)   # headers + footers, returns count
print("removed", total, "furniture items")

doc.save("clean.pdf")

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()

headers, _ := doc.RemoveHeaders(0.8)   // count removed
footers, _ := doc.RemoveFooters(0.8)
total, _ := doc.RemoveArtifacts(0.8)   // headers + footers
fmt.Printf("removed %d furniture items\n", total)

JavaScript (WASM)

import { WasmPdfDocument } from "pdf-oxide-wasm";

const doc = new WasmPdfDocument(bytes);

doc.removeHeaders(0.8);   // returns count removed
doc.removeFooters(0.8);
const total = doc.removeArtifacts(0.8);  // headers + footers

const output = doc.save();
doc.free();

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");

int headers = doc.remove_headers(0.8f);   // count removed
int footers = doc.remove_footers(0.8f);
int total   = doc.remove_artifacts(0.8f); // headers + footers
std::cout << "removed " << total << " furniture items\n";

Swift

import PdfOxide

let doc = try Document.open("report.pdf")

let headers = try doc.removeHeaders(threshold: 0.8)   // count removed
let footers = try doc.removeFooters(threshold: 0.8)
let total   = try doc.removeArtifacts(threshold: 0.8) // headers + footers
print("removed \(total) furniture items")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');

doc.removeHeaders(0.8);   // returns count removed
doc.removeFooters(0.8);
final total = doc.removeArtifacts(0.8);  // headers + footers
print('removed $total furniture items');

library(pdfoxide)

doc <- pdf_open("report.pdf")

headers <- pdf_remove_headers(doc, 0.8)   # count removed
footers <- pdf_remove_footers(doc, 0.8)
total   <- pdf_remove_artifacts(doc, 0.8) # headers + footers
cat("removed", total, "furniture items\n")

Julia

using PdfOxide

doc = open_document("report.pdf")

headers = remove_headers(doc, 0.8)   # count removed
footers = remove_footers(doc, 0.8)
total   = remove_artifacts(doc, 0.8) # headers + footers
println("removed ", total, " furniture items")

Zig

const pdf_oxide = @import("pdf_oxide");

var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();

const headers = try doc.removeHeaders(0.8);   // count removed
const footers = try doc.removeFooters(0.8);
const total   = try doc.removeArtifacts(0.8); // headers + footers
std.debug.print("removed {d} furniture items\n", .{total});

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];

int32_t headers = [doc removeHeaders:0.8 error:&err];   // count removed
int32_t footers = [doc removeFooters:0.8 error:&err];
int32_t total   = [doc removeArtifacts:0.8 error:&err]; // headers + footers
NSLog(@"removed %d furniture items", total);

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")

{:ok, headers} = PdfOxide.remove_headers(doc, 0.8)   # count removed
{:ok, footers} = PdfOxide.remove_footers(doc, 0.8)
{:ok, total}   = PdfOxide.remove_artifacts(doc, 0.8) # headers + footers
IO.puts("removed #{total} furniture items")

Dekoration auf einer einzelnen Seite löschen

Wenn Sie genau wissen, welche Seite die Dekoration enthält, markiert die seitenweise erase_*-Familie den Kopfbereich (obere 15%), den Fußbereich (untere 15%) oder beide zum Löschen — ohne seitenübergreifende Wiederholungsanalyse. Diese Funktionen nehmen einen nullbasierten Seitenindex entgegen.

Rust

use pdf_oxide::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

doc.erase_header(0)?;     // erase the top 15% of page 0
doc.erase_footer(0)?;     // erase the bottom 15% of page 0
doc.erase_artifacts(0)?;  // erase both header and footer of page 0

Python

doc = PdfDocument("report.pdf")

doc.erase_header(0)      # erase the top 15% of page 0
doc.erase_footer(0)      # erase the bottom 15% of page 0
doc.erase_artifacts(0)   # erase both header and footer of page 0

doc.save("clean.pdf")

JavaScript (WASM)

const doc = new WasmPdfDocument(bytes);

doc.eraseHeader(0);     // erase the top 15% of page 0
doc.eraseFooter(0);     // erase the bottom 15% of page 0
doc.eraseArtifacts(0);  // erase both header and footer of page 0

const output = doc.save();
doc.free();

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");

doc.erase_header(0);     // erase the top 15% of page 0
doc.erase_footer(0);     // erase the bottom 15% of page 0
doc.erase_artifacts(0);  // erase both header and footer of page 0

Swift

import PdfOxide

let doc = try Document.open("report.pdf")

try doc.eraseHeader(0)     // erase the top 15% of page 0
try doc.eraseFooter(0)     // erase the bottom 15% of page 0
try doc.eraseArtifacts(0)  // erase both header and footer of page 0

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');

doc.eraseHeader(0);     // erase the top 15% of page 0
doc.eraseFooter(0);     // erase the bottom 15% of page 0
doc.eraseArtifacts(0);  // erase both header and footer of page 0

library(pdfoxide)

doc <- pdf_open("report.pdf")

pdf_erase_header(doc, 0)     # erase the top 15% of page 0
pdf_erase_footer(doc, 0)     # erase the bottom 15% of page 0
pdf_erase_artifacts(doc, 0)  # erase both header and footer of page 0

Julia

using PdfOxide

doc = open_document("report.pdf")

erase_header(doc, 0)     # erase the top 15% of page 0
erase_footer(doc, 0)     # erase the bottom 15% of page 0
erase_artifacts(doc, 0)  # erase both header and footer of page 0

Zig

const pdf_oxide = @import("pdf_oxide");

var doc = try pdf_oxide.Document.open("report.pdf");
defer doc.deinit();

_ = try doc.eraseHeader(0);     // erase the top 15% of page 0
_ = try doc.eraseFooter(0);     // erase the bottom 15% of page 0
_ = try doc.eraseArtifacts(0);  // erase both header and footer of page 0

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];

[doc eraseHeader:0 error:&err];     // erase the top 15% of page 0
[doc eraseFooter:0 error:&err];     // erase the bottom 15% of page 0
[doc eraseArtifacts:0 error:&err];  // erase both header and footer of page 0

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")

{:ok, _} = PdfOxide.erase_header(doc, 0)     # erase the top 15% of page 0
{:ok, _} = PdfOxide.erase_footer(doc, 0)     # erase the bottom 15% of page 0
{:ok, _} = PdfOxide.erase_artifacts(doc, 0)  # erase both header and footer of page 0

Vollständiger Schwärzungs-Workflow

Dieses Beispiel findet sensible Bereiche, reiht destruktive Schwärzungen ein, wendet sie mit Metadaten-Bereinigung an und schreibt eine saubere Datei.

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a destructive box for each match.
for i in range(doc.page_count()):
    page = doc.page(i)
    for t in page.find_text_containing("SSN"):
        x, y, w, h = t.bbox            # (x, y, width, height)
        doc.add_redaction(i, (x, y, x + w, y + h))

# Step 2: apply destructively + scrub metadata (the default).
report = doc.apply_redactions_destructive(scrub_metadata=True)
print("regions:", report["regions"], "glyphs removed:", report["glyphs_removed"])

# Step 3: save the rewritten document.
doc.save("report-redacted.pdf")

Rust

use pdf_oxide::editor::{DocumentEditor, EditableDocument};
use pdf_oxide::redaction::RedactionOptions;

let mut editor = DocumentEditor::open("sensitive-report.pdf")?;

// Step 1: queue destructive boxes (x0, y0, x1, y1 in points).
editor.add_redaction(0, [100.0, 700.0, 300.0, 714.0], None)?;
editor.add_redaction(0, [100.0, 680.0, 300.0, 694.0], None)?;

// Step 2: apply destructively with safe defaults (scrubs metadata).
let report = editor.apply_redactions_destructive(RedactionOptions::default())?;
println!(
    "regions: {}, glyphs removed: {}",
    report.regions, report.glyphs_removed
);

// Step 3: save.
editor.save("report-redacted.pdf")?;

Java

import fyi.oxide.pdf.*;
import fyi.oxide.pdf.search.SearchMatch;

try (PdfDocument doc = PdfDocument.open("sensitive-report.pdf");
     DocumentEditor editor = DocumentEditor.open("sensitive-report.pdf")) {
    // Step 1: locate sensitive text and queue a destructive box for each hit.
    for (SearchMatch m : doc.search("SSN")) {
        editor.addRedaction(m.pageIndex(), m.bbox());  // bbox is x0,y0,x1,y1
    }
    // Step 2: apply destructively + scrub metadata (the default).
    editor.applyRedactionsDestructive();
    // Step 3: save.
    editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"));
}

Kotlin

import fyi.oxide.pdf.*

PdfDocument.open("sensitive-report.pdf").use { doc ->
    DocumentEditor.open("sensitive-report.pdf").use { editor ->
        // Step 1: locate sensitive text and queue a box for each hit.
        for (m in doc.search("SSN")) {
            editor.addRedaction(m.pageIndex(), m.bbox())  // bbox is x0,y0,x1,y1
        }
        // Step 2: apply destructively + scrub metadata (the default).
        editor.applyRedactionsDestructive()
        // Step 3: save.
        editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
    }
}

Scala

import fyi.oxide.pdf.{PdfDocument, DocumentEditor, searchSeq}
import scala.util.Using

Using.resource(PdfDocument.open("sensitive-report.pdf")) { doc =>
  Using.resource(DocumentEditor.open("sensitive-report.pdf")) { editor =>
    // Step 1: locate sensitive text and queue a box for each hit.
    for (m <- doc.searchSeq("SSN"))
      editor.addRedaction(m.pageIndex, m.bbox)  // bbox is x0,y0,x1,y1
    // Step 2: apply destructively + scrub metadata (the default).
    editor.applyRedactionsDestructive()
    // Step 3: save.
    editor.saveTo(java.nio.file.Path.of("report-redacted.pdf"))
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [doc (pdf/open "sensitive-report.pdf")
            ed  (pdf/editor "sensitive-report.pdf")]
  ;; Step 1: locate sensitive text and queue a box for each hit.
  (doseq [m (pdf/search doc "SSN")]
    (pdf/add-redaction ed (.pageIndex m) (.bbox m)))  ; bbox is x0,y0,x1,y1
  ;; Step 2: apply destructively + scrub metadata (the default).
  (pdf/apply-redactions ed)
  ;; Step 3: save.
  (java.nio.file.Files/write
    (java.nio.file.Path/of "report-redacted.pdf" (into-array String []))
    (pdf/editor-save ed)))

Ruby

require 'pdf_oxide'

doc = PdfOxide::PdfDocument.open('sensitive-report.pdf')
PdfOxide::DocumentEditor.open('sensitive-report.pdf') do |ed|
  # Step 1: locate sensitive text and queue a box for each hit.
  doc.search('SSN').each do |m|
    b = m[:bbox]  # { x:, y:, width:, height: }
    ed.add_redaction(page: m[:page], rect: [b[:x], b[:y], b[:x] + b[:width], b[:y] + b[:height]])
  end
  # Step 2: apply destructively + scrub metadata (raises on fail-closed).
  ed.apply_redactions!(scrub_metadata: true)
  # Step 3: save.
  ed.save_to('report-redacted.pdf')
end

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc    = pdf_oxide::Document::open("sensitive-report.pdf");
auto editor = pdf_oxide::DocumentEditor::open("sensitive-report.pdf");

// Step 1: locate sensitive text and queue a box for each hit.
for (const auto& m : doc.search_all("SSN", /*case_sensitive=*/false)) {
    editor.redaction_add(m.page, m.bbox.x, m.bbox.y,
                         m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height,
                         0.0, 0.0, 0.0);
}
// Step 2: apply destructively + scrub metadata.
int glyphs = editor.redaction_apply(/*scrub_metadata=*/true, 0.0, 0.0, 0.0);
std::cout << "glyphs removed: " << glyphs << "\n";
// Step 3: save.
editor.save("report-redacted.pdf");

Swift

import PdfOxide

let doc    = try Document.open("sensitive-report.pdf")
let editor = try DocumentEditor.open("sensitive-report.pdf")

// Step 1: locate sensitive text and queue a box for each hit.
for m in try doc.searchAll("SSN", false) {
    try editor.redactionAdd(m.page,
                            x1: m.bbox.x, y1: m.bbox.y,
                            x2: m.bbox.x + m.bbox.width, y2: m.bbox.y + m.bbox.height,
                            r: 0, g: 0, b: 0)
}
// Step 2: apply destructively + scrub metadata.
let glyphs = try editor.redactionApply(scrubMetadata: true, r: 0, g: 0, b: 0)
print("glyphs removed: \(glyphs)")
// Step 3: save.
try editor.save("report-redacted.pdf")

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc    = PdfDocument.open('sensitive-report.pdf');
final editor = DocumentEditor.open('sensitive-report.pdf');

// Step 1: locate sensitive text and queue a box for each hit.
for (final m in doc.searchAll('SSN', false)) {
  editor.redactionAdd(m.page, m.bbox.x, m.bbox.y,
      m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height);
}
// Step 2: apply destructively + scrub metadata.
final glyphs = editor.redactionApply(scrubMetadata: true);
print('glyphs removed: $glyphs');
// Step 3: save.
editor.save('report-redacted.pdf');

library(pdfoxide)

doc    <- pdf_open("sensitive-report.pdf")
editor <- pdf_editor_open("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
for (m in pdf_search_all(doc, "SSN", FALSE)) {
  b <- m$bbox  # list(x=, y=, width=, height=)
  pdf_redaction_add(editor, m$page, b$x, b$y, b$x + b$width, b$y + b$height, 0, 0, 0)
}
# Step 2: apply destructively + scrub metadata.
glyphs <- pdf_redaction_apply(editor, scrub_metadata = TRUE, 0, 0, 0)
cat("glyphs removed:", glyphs, "\n")
# Step 3: save.
pdf_editor_save(editor, "report-redacted.pdf")

Julia

using PdfOxide

doc    = open_document("sensitive-report.pdf")
editor = open_editor("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
for m in search_all(doc, "SSN", false)
    b = m.bbox
    redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0, 0, 0)
end
# Step 2: apply destructively + scrub metadata.
glyphs = redaction_apply(editor, true, 0, 0, 0)
println("glyphs removed: ", glyphs)
# Step 3: save.
save(editor, "report-redacted.pdf")

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc    = try pdf_oxide.Document.open("sensitive-report.pdf");
defer doc.deinit();
var editor = try pdf_oxide.DocumentEditor.openEditor("sensitive-report.pdf");
defer editor.deinit();

// Step 1: locate sensitive text and queue a box for each hit.
const hits = try doc.searchAll(a, "SSN", false);
for (hits) |m| {
    try editor.redactionAdd(@intCast(m.page), m.bbox.x, m.bbox.y,
        m.bbox.x + m.bbox.width, m.bbox.y + m.bbox.height, 0, 0, 0);
}
// Step 2: apply destructively + scrub metadata.
const glyphs = try editor.redactionApply(true, 0, 0, 0);
std.debug.print("glyphs removed: {d}\n", .{glyphs});
// Step 3: save.
try editor.save("report-redacted.pdf");

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"sensitive-report.pdf" error:&err];
POXDocumentEditor *editor = [POXDocumentEditor openEditor:@"sensitive-report.pdf" error:&err];

// Step 1: locate sensitive text and queue a box for each hit.
for (POXSearchResult *m in [doc searchAll:@"SSN" caseSensitive:NO error:&err]) {
    POXBbox b = m.bbox;
    [editor redactionAddPage:m.page x1:b.x y1:b.y x2:b.x + b.width y2:b.y + b.height
                           r:0 g:0 b:0 error:&err];
}
// Step 2: apply destructively + scrub metadata.
int32_t glyphs = [editor redactionApplyScrubMetadata:YES r:0 g:0 b:0 error:&err];
NSLog(@"glyphs removed: %d", glyphs);
// Step 3: save.
[editor saveToPath:@"report-redacted.pdf" error:&err];

Elixir

{:ok, doc}    = PdfOxide.open("sensitive-report.pdf")
{:ok, editor} = PdfOxide.open_editor("sensitive-report.pdf")

# Step 1: locate sensitive text and queue a box for each hit.
{:ok, hits} = PdfOxide.search_all(doc, "SSN", false)
Enum.each(hits, fn m ->
  b = m.bbox
  PdfOxide.redaction_add(editor, m.page, b.x, b.y, b.x + b.width, b.y + b.height, 0.0, 0.0, 0.0)
end)
# Step 2: apply destructively + scrub metadata.
{:ok, glyphs} = PdfOxide.redaction_apply(editor, true, 0.0, 0.0, 0.0)
IO.puts("glyphs removed: #{glyphs}")
# Step 3: save.
PdfOxide.editor_save(editor, "report-redacted.pdf")

Methodenreferenz

Destruktive Schwärzung

Kanonischer Name (C ABI)	Rust (`DocumentEditor`)	Python (`PdfDocument`)	Go (`DocumentEditor`)	C# (`DocumentEditor`)	JS (WASM)
`pdf_redaction_add`	`add_redaction(page, [x0,y0,x1,y1], Option<[r,g,b]>) -> Result<()>`	`add_redaction(page, rect, fill=None)`	`AddRedaction(page, [4]float64, *[3]float64) error`	`AddRedaction(pageIndex, x1, y1, x2, y2, r=0, g=0, b=0)`	`addRedaction(page, x0, y0, x1, y1, fill?)`
`pdf_redaction_count`	`redaction_count(page) -> Result<usize>`	`redaction_count(page) -> int`	`RedactionCount(page) (int, error)`	`RedactionCount(pageIndex) -> int`	`redactionCount(page) -> number`
`pdf_redaction_apply`	`apply_redactions_destructive(RedactionOptions) -> Result<RedactionReport>`	`apply_redactions_destructive(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True, fill=(0,0,0)) -> dict`	`ApplyRedactions(scrubMetadata bool) (int, error)`	`ApplyRedactions(scrubMetadata=true, r=0, g=0, b=0) -> int`	`applyRedactionsDestructive(scrubMetadata?) -> RedactionReport`
`pdf_redaction_scrub_metadata`	`sanitize_document(RedactionOptions) -> Result<RedactionReport>`	`sanitize_document(scrub_metadata=True, remove_javascript=True, remove_embedded_files=True) -> dict`	`SanitizeDocument() (int, error)`	`SanitizeDocument() -> int`	`sanitizeDocument(scrub?, removeJS?, removeEmbedded?) -> RedactionReport`

Der Swift-Wrapper stellt dieselbe Familie als redactionAdd, redactionCount, redactionApply(scrubMetadata:r:g:b:) und redactionScrubMetadata() auf DocumentEditor bereit.

Kopf-/Fußzeilen- und Artefakt-Entfernung

Kanonischer Name (C ABI)	Rust (`PdfDocument`)	Python (`PdfDocument`)	Go (`PdfDocument`)	JS (WASM)
`pdf_document_remove_headers`	`remove_headers(threshold: f32) -> Result<usize>`	`remove_headers(threshold=0.8) -> int`	`RemoveHeaders(threshold float32) (int, error)`	`removeHeaders(threshold) -> number`
`pdf_document_remove_footers`	`remove_footers(threshold: f32) -> Result<usize>`	`remove_footers(threshold=0.8) -> int`	`RemoveFooters(threshold float32) (int, error)`	`removeFooters(threshold) -> number`
`pdf_document_remove_artifacts`	`remove_artifacts(threshold: f32) -> Result<usize>`	`remove_artifacts(threshold=0.8) -> int`	`RemoveArtifacts(threshold float32) (int, error)`	`removeArtifacts(threshold) -> number`
`pdf_document_erase_header`	`erase_header(page: usize) -> Result<()>`	`erase_header(page) -> None`	—	`eraseHeader(page)`
`pdf_document_erase_footer`	`erase_footer(page: usize) -> Result<()>`	`erase_footer(page) -> None`	—	`eraseFooter(page)`
`pdf_document_erase_artifacts`	`erase_artifacts(page: usize) -> Result<()>`	`erase_artifacts(page) -> None`	—	`eraseArtifacts(page)`

Swift stellt removeHeaders(threshold:), removeFooters(threshold:), removeArtifacts(threshold:), eraseHeader(_:), eraseFooter(_:) und eraseArtifacts(_:) auf Document bereit. Go’s PdfDocument stellt die Remove*-Familie bereit, aber nicht die seitenweise Erase*-Familie. C# stellt derzeit keine der Dekorations-Familien bereit.

Der RedactionReport

redaction_apply und redaction_scrub_metadata geben einen Bericht zurück, damit Sie prüfen können, ob der Durchlauf tatsächlich etwas bewirkt hat:

Feld	Bedeutung
`regions`	Anzahl der angewendeten Bereiche.
`glyphs_removed`	Aus Inhaltsströmen physisch entfernte Glyphen.
`images_modified` / `images_removed`	Bilder, deren abgedeckte Pixel überschrieben wurden / vollständig gelöschte Bilder.
`paths_pruned`	Pfadteilpfade, die gelöscht oder geometrisch beschnitten wurden.
`annotations_removed`	Entfernte Elemente der obersten Ebene (Annotationen für Schwärzung; bereinigte Wurzeln für `sanitize_document`).
`fonts_scrubbed`	Schriftarten, deren `/Widths` / `/ToUnicode` bereinigt wurden.
`bytes_removed`	Bestmögliche Schätzung der insgesamt entfernten Bytes.

Wichtige Hinweise

Destruktiv, nicht kosmetisch. redaction_apply entfernt abgedeckte Inhalte physisch aus der umgeschriebenen Datei gemäß ISO 32000-1:2008 §12.5.6.23 — der geheime Text verschwindet aus der Datei, er wird nicht nur überlagert. Die ursprünglichen /Contents-Objekte werden hart gelöscht, damit sie nicht als vom GC übersehene Waisen überleben.
Sicheres Fehlschlagen. Wenn eine geschwärzte Seite Text in einer zusammengesetzten/Type0- oder anderweitig nicht umschreibbaren Schriftart enthält, gibt redaction_apply einen Fehler zurück, anstatt das Risiko einer stillschweigenden unvollständigen Schwärzung einzugehen. Behandeln Sie den Fehler; nehmen Sie keinen Erfolg an.
Sichere Standardwerte. RedactionOptions::default() bereinigt Metadaten, entfernt Dokument-JavaScript und eingebettete Dateien, entfernt versteckte optionale Inhaltsebenen und zeichnet eine opake schwarze Überlagerung, auch wenn die Quell-/Redact-Annotation keine /IC-Farbe angegeben hat.
Unwiderruflich. Nach dem Speichern ist die Entfernung dauerhaft. Arbeiten Sie stets an einer Kopie des Originaldokuments.
Leistung. Schwärzung und Bereinigung laufen auf demselben Parser, der eine Extraktion mit 0,8 ms Mittelwert und 100 % Erfolgsrate liefert, sodass ein Schwärzungsdurchlauf kaum Overhead über die eigentliche Inhaltsumschreibung hinaus erzeugt.

Häufig gestellte Fragen

Ist die PDF-Oxide-Schwärzung wirklich destruktiv, oder ist es nur ein schwarzes Kästchen über dem Text? Wirklich destruktiv. redaction_apply (apply_redactions_destructive) entfernt die abgedeckten Glyphen, Bilder und Pfadgeometrien aus dem Inhaltsstrom und löscht die ursprünglichen /Contents-Objekte hart aus der Ausgabe, gemäß ISO 32000-1:2008 §12.5.6.23. Die schwarze Überlagerung wird zusätzlich zur Inhaltsent fernung gezeichnet, nicht stattdessen.

Was passiert, wenn eine Seite eine zusammengesetzte Schriftart verwendet, die nicht sicher geschwärzt werden kann? Die Schwärzung schlägt sicher fehl: redaction_apply gibt einen Fehler zurück, anstatt teilweise zu schwärzen und ein wiederherstellbares Fragment zu hinterlassen. Fangen Sie den Fehler ab und rasterisieren Sie die Seite, wenn Sie solche Inhalte schwärzen müssen.

Muss ich Bereiche schwärzen, nur um Metadaten zu entfernen? Nein. Rufen Sie redaction_scrub_metadata (sanitize_document) für einen eigenständigen Durchlauf auf, der /Info, XMP /Metadata, Dokument-JavaScript und eingebettete Dateien entfernt, ohne die Seitengeometrie zu berühren.

Wie werden Kopf- und Fußzeilen zur Entfernung erkannt? PDF Oxide priorisiert ISO-32000-/Artifact-Tags (100% genau wenn vorhanden) und fällt auf eine Heuristik zurück, die Text markiert, der in mindestens threshold der Seiten (Standard 0.8) in den oberen oder unteren 15% der Seite wiederholt auftritt.

Schwärzung und Bereinigung

Wie schwärze ich Text aus einer PDF-Datei?

Schwärzungsrechteck einreihen

Eingereihte Bereiche zählen

Was ist der Unterschied zwischen Schwärzung und Bereinigung?

Wie entferne ich wiederkehrende Kopf- und Fußzeilen?

Dekoration auf einer einzelnen Seite löschen

Vollständiger Schwärzungs-Workflow

Methodenreferenz

Destruktive Schwärzung

Kopf-/Fußzeilen- und Artefakt-Entfernung

Der RedactionReport

Wichtige Hinweise

Häufig gestellte Fragen

Verwandte Seiten