PDF Compression
DocumentEditor::save_with_options() can recompress streams, garbage-collect orphaned objects, and linearise the file for fast web view — in a single pass. The easiest way to apply all three is the pdf-oxide compress CLI, which wraps them behind one command.
Binding coverage. Full
SaveOptionswithcompress / garbage_collect / linearizeis exposed in Rust (editor.save_with_options(path, opts)) and Python (viadoc.save(..., compress=True, ...)where bound). From Node, WASM, Go, and C#, run the CLI (pdf-oxide compress input.pdf -o out.pdf) as a subprocess or through a build step — the CLI is the stable interface these bindings share untilSaveOptionsis wired through each FFI surface.
CLI (all platforms)
# Basic: produces input_compressed.pdf
pdf-oxide compress input.pdf
# Explicit output
pdf-oxide compress input.pdf -o smaller.pdf
# Print size comparison
pdf-oxide compress report.pdf
# → Compressed report.pdf -> report_compressed.pdf (3412901 -> 1847200 bytes)
The CLI enables compress: true, garbage_collect: true, linearize: true by default — this is the combination that produces the smallest file while staying valid per ISO 32000.
Rust API
use pdf_oxide::editor::{DocumentEditor, EditableDocument, SaveOptions};
let mut editor = DocumentEditor::open("input.pdf")?;
editor.save_with_options("output.pdf", SaveOptions {
compress: true, // FlateDecode all uncompressed streams
garbage_collect: true, // Drop orphaned objects
linearize: true, // Linearise for fast web view
..Default::default()
})?;
SaveOptions fields
| Field | Default | Effect |
|---|---|---|
compress |
true in CLI / false default |
Apply FlateDecode to streams that were stored uncompressed |
garbage_collect |
true in CLI |
Drop indirect objects no longer referenced from the trailer |
linearize |
true in CLI |
Emit Linearized (“Fast Web View”) layout so viewers can render page 1 before the full file downloads |
encryption |
None |
Attach an EncryptionConfig to save as encrypted — see Encryption & Security |
What each option actually does
compress— walks every content stream, Form XObject, ToUnicode CMap, and metadata stream; if the stream has no filter or an unrecognised filter, re-wraps it withFlateDecode. Already-compressed streams (JBIG2 images, CCITT tables, existing Flate) are left alone.garbage_collect— performs a reachability scan from/Rootand/Info, marks every live indirect object, and writes only those in the output xref. Useful after heavy editing whereremove_page,flatten_forms, orerase_regionleave orphans.linearize— re-orders objects so the first page’s content stream and its resource dependencies appear at the front of the file, with a linearisation dictionary + hint stream. PDF readers download the file in order and render page 1 before the full file lands, which is the main user-visible benefit for PDFs served from a CDN.
When to compress
- After bulk edits — if you ran
flatten_forms,flatten_all_annotations,apply_all_redactions, or deleted a bunch of pages, GC + recompression typically drops 30–60 % of the file size. - Before distributing or emailing — end-users care about file size, and Linearize makes the first page feel instant over slow connections.
- As a last step in ETL pipelines — after extraction and re-generation, compress once at the tail of the pipeline; don’t decompress + recompress on every transform.
When NOT to compress
- During frequent incremental edits — full rewrites lose xref streams and object streams that keep incremental appends small. For interactive editing loops, use
SaveOptions::incremental()and leave compression for a final pass. - On PDFs that have already been aggressively compressed — if the streams already have
FlateDecodefilters with a good predictor, you may save only a couple of percent. Runpdf-oxide compressonce and measure before wiring it into your pipeline.
Size Reduction Expectations
Typical savings on real-world PDFs (measured across the 3,830-file test corpus):
| Source | Before | After | Saving |
|---|---|---|---|
| Scanned invoice (uncompressed streams) | 2.4 MB | 0.8 MB | ~66 % |
| LaTeX research paper | 1.1 MB | 0.95 MB | ~14 % |
| Government form (after flatten) | 890 KB | 240 KB | ~73 % |
| Already-optimised marketing PDF | 1.8 MB | 1.75 MB | ~3 % |
Bulk-edited files and forms-after-flattening are the largest wins. Already-optimised files see modest savings from linearisation alone.
Decompress / authenticate first
DocumentEditor does not currently carry a password-authenticate call, so encrypted PDFs need to be decrypted first. Use PdfDocument.open_with_password() to read, then save an unencrypted copy, then open that copy in the editor:
use pdf_oxide::api::Pdf;
use pdf_oxide::editor::{DocumentEditor, EditableDocument, SaveOptions};
let doc = Pdf::open_with_password("protected.pdf", "pw")?;
doc.save("temp-unencrypted.pdf")?;
let mut editor = DocumentEditor::open("temp-unencrypted.pdf")?;
editor.save_with_options("compressed.pdf", SaveOptions {
compress: true,
garbage_collect: true,
linearize: true,
..Default::default()
})?;
Same pattern from Python via PdfDocument(path, password="pw") + save() + CLI.
Related Pages
- Editing Overview —
SaveOptions+ full editor surface - Encryption & Security — combine
compresswithsave_with_encryption - CLI Getting Started — all 22 CLI commands