Skip to content

PDF Compression

DocumentEditor::save_with_options() can recompress streams, garbage-collect orphaned objects, and linearise the file for fast web view — in a single pass. The easiest way to apply all three is the pdf-oxide compress CLI, which wraps them behind one command.

Binding coverage. Full SaveOptions with compress / garbage_collect / linearize is exposed in Rust (editor.save_with_options(path, opts)) and Python (via doc.save(..., compress=True, ...) where bound). From Node, WASM, Go, and C#, run the CLI (pdf-oxide compress input.pdf -o out.pdf) as a subprocess or through a build step — the CLI is the stable interface these bindings share until SaveOptions is wired through each FFI surface.

CLI (all platforms)

# Basic: produces input_compressed.pdf
pdf-oxide compress input.pdf

# Explicit output
pdf-oxide compress input.pdf -o smaller.pdf

# Print size comparison
pdf-oxide compress report.pdf
# → Compressed report.pdf -> report_compressed.pdf (3412901 -> 1847200 bytes)

The CLI enables compress: true, garbage_collect: true, linearize: true by default — this is the combination that produces the smallest file while staying valid per ISO 32000.

Rust API

use pdf_oxide::editor::{DocumentEditor, EditableDocument, SaveOptions};

let mut editor = DocumentEditor::open("input.pdf")?;

editor.save_with_options("output.pdf", SaveOptions {
    compress: true,         // FlateDecode all uncompressed streams
    garbage_collect: true,  // Drop orphaned objects
    linearize: true,        // Linearise for fast web view
    ..Default::default()
})?;

SaveOptions fields

Field Default Effect
compress true in CLI / false default Apply FlateDecode to streams that were stored uncompressed
garbage_collect true in CLI Drop indirect objects no longer referenced from the trailer
linearize true in CLI Emit Linearized (“Fast Web View”) layout so viewers can render page 1 before the full file downloads
encryption None Attach an EncryptionConfig to save as encrypted — see Encryption & Security

What each option actually does

  • compress — walks every content stream, Form XObject, ToUnicode CMap, and metadata stream; if the stream has no filter or an unrecognised filter, re-wraps it with FlateDecode. Already-compressed streams (JBIG2 images, CCITT tables, existing Flate) are left alone.
  • garbage_collect — performs a reachability scan from /Root and /Info, marks every live indirect object, and writes only those in the output xref. Useful after heavy editing where remove_page, flatten_forms, or erase_region leave orphans.
  • linearize — re-orders objects so the first page’s content stream and its resource dependencies appear at the front of the file, with a linearisation dictionary + hint stream. PDF readers download the file in order and render page 1 before the full file lands, which is the main user-visible benefit for PDFs served from a CDN.

When to compress

  • After bulk edits — if you ran flatten_forms, flatten_all_annotations, apply_all_redactions, or deleted a bunch of pages, GC + recompression typically drops 30–60 % of the file size.
  • Before distributing or emailing — end-users care about file size, and Linearize makes the first page feel instant over slow connections.
  • As a last step in ETL pipelines — after extraction and re-generation, compress once at the tail of the pipeline; don’t decompress + recompress on every transform.

When NOT to compress

  • During frequent incremental edits — full rewrites lose xref streams and object streams that keep incremental appends small. For interactive editing loops, use SaveOptions::incremental() and leave compression for a final pass.
  • On PDFs that have already been aggressively compressed — if the streams already have FlateDecode filters with a good predictor, you may save only a couple of percent. Run pdf-oxide compress once and measure before wiring it into your pipeline.

Size Reduction Expectations

Typical savings on real-world PDFs (measured across the 3,830-file test corpus):

Source Before After Saving
Scanned invoice (uncompressed streams) 2.4 MB 0.8 MB ~66 %
LaTeX research paper 1.1 MB 0.95 MB ~14 %
Government form (after flatten) 890 KB 240 KB ~73 %
Already-optimised marketing PDF 1.8 MB 1.75 MB ~3 %

Bulk-edited files and forms-after-flattening are the largest wins. Already-optimised files see modest savings from linearisation alone.

Decompress / authenticate first

DocumentEditor does not currently carry a password-authenticate call, so encrypted PDFs need to be decrypted first. Use PdfDocument.open_with_password() to read, then save an unencrypted copy, then open that copy in the editor:

use pdf_oxide::api::Pdf;
use pdf_oxide::editor::{DocumentEditor, EditableDocument, SaveOptions};

let doc = Pdf::open_with_password("protected.pdf", "pw")?;
doc.save("temp-unencrypted.pdf")?;

let mut editor = DocumentEditor::open("temp-unencrypted.pdf")?;
editor.save_with_options("compressed.pdf", SaveOptions {
    compress: true,
    garbage_collect: true,
    linearize: true,
    ..Default::default()
})?;

Same pattern from Python via PdfDocument(path, password="pw") + save() + CLI.