Skip to content

Clojure API Reference

PDF Oxide ships idiomatic Clojure bindings as a thin wrapper over the fyi.oxide:pdf-oxide Java binding, which owns the single JNI native bridge (the pdf_oxide_jni crate). The wrapper adds zero native code: it calls the Java classes directly via interop and returns Clojure-friendly values (java.util.List becomes a vector, java.util.Optional becomes a value or nil). The handle types (Pdf, PdfDocument, DocumentEditor, AutoExtractor) are AutoCloseable, so use with-open for deterministic cleanup.

;; deps.edn
{:deps {fyi.oxide/pdf-oxide-clojure {:mvn/version "0.3.69"}}}
;; Leiningen
[fyi.oxide/pdf-oxide-clojure "0.3.69"]

The JNI native library (libpdf_oxide_jni) is not bundled — make it loadable via System.loadLibrary("pdf_oxide_jni") on your java.library.path, or point the Java NativeLoader at it with -Dfyi.oxide.pdf.lib.path=<path>.

Every function lives in the pdf-oxide.core namespace:

(require '[pdf-oxide.core :as pdf])

For other languages, see the Java API Reference, the Python API Reference, the Rust API Reference, and Types & Enums.


Pdf — Creation

Functions that build a new in-memory Pdf from source content, plus serialization to a byte array. The returned Pdf is AutoCloseable.

Creation

(from-markdown ^Pdf [^String markdown])

Create a Pdf from a Markdown string.

(from-html ^Pdf [^String html])

Create a Pdf from an HTML string.

Saving

(save ^bytes [^Pdf pdf])

Serialize a built Pdf to a byte array (the raw PDF bytes).

(with-open [p (pdf/from-markdown "# Hello\n\nbody\n")]
  (pdf/save p))                 ; => byte[]

PdfDocument — Opening, Extraction & Rendering

The primary read handle for an existing PDF. Open from a byte array or a filesystem path, then extract text, convert to Markdown/HTML, render pages, search, and read metadata and form fields. AutoCloseable.

Opening

(open ^PdfDocument [source])
(open ^PdfDocument [source ^String password])

Open a document from a byte array or a filesystem path string. The two-arity form supplies a password for encrypted PDFs.

(authenticate [^PdfDocument doc ^String password])

Authenticate an encrypted document after opening; returns a boolean.

Document Queries

(page-count [^PdfDocument doc])

Return the number of pages in the document.

(producer [^PdfDocument doc])

Return the /Producer metadata string, or nil if absent.

(creator [^PdfDocument doc])

Return the /Creator metadata string, or nil if absent.

Text Extraction

(extract-text [^PdfDocument doc page])

Extract plain text from a single zero-indexed page.

(extract-structured [^PdfDocument doc page])

Extract structured text (spans/blocks with positioning) for a single page.

Conversion

(to-markdown [^PdfDocument doc])
(to-markdown [^PdfDocument doc page])

Convert the whole document, or a single page, to Markdown.

(to-html [^PdfDocument doc])
(to-html [^PdfDocument doc page])

Convert the whole document, or a single page, to HTML.

Rendering

(render ^bytes [^PdfDocument doc page])
(render ^bytes [^PdfDocument doc page dpi])

Render a page to PNG image bytes, optionally at a given DPI.

(search [^PdfDocument doc ^String query])

Search the document for text; returns a vector of SearchMatch results.

Forms

(form-fields [^PdfDocument doc])

Return a vector of the document’s AcroForm form fields.

Page Access

(page ^PdfPage [^PdfDocument doc idx])

Get a PdfPage handle for the zero-indexed page.

(pages [^PdfDocument doc])

Return a vector of all PdfPage handles in the document.


PdfPage — Page Element Extraction

A page handle obtained from (pdf/page doc idx) or (pdf/pages doc). Each extraction function converts the Java List result into a Clojure vector.

Elements

(words [^PdfPage page])

Return a vector of word elements on the page.

(lines [^PdfPage page])

Return a vector of line elements on the page.

(chars [^PdfPage page])

Return a vector of per-character glyphs on the page. (This pdf/chars intentionally shadows clojure.core/chars.)

(tables [^PdfPage page])

Return a vector of detected tables on the page.

(images [^PdfPage page])

Return a vector of image elements on the page.

(annotations [^PdfPage page])

Return a vector of annotations on the page.

Page Text

(page-text [^PdfPage page])
(page-text [^PdfPage page region])

Return the page’s plain text, optionally restricted to a BBox region.

(with-open [d (pdf/open (pdf/save p))]
  (let [pg (pdf/page d 0)]
    (map #(.text %) (pdf/words pg))                          ; word strings
    (pdf/page-text pg (BBox. 0.0 0.0 1000.0 1000.0))))       ; region text

DocumentEditor — Editing & Redaction

A mutable editing handle opened independently of PdfDocument. Supports metadata scrubbing and destructive redaction, then serializes the result to bytes. AutoCloseable.

(editor ^DocumentEditor [source])

Open a DocumentEditor from a byte array or a filesystem path string.

(scrub-metadata [^DocumentEditor ed])

Remove document metadata (info dictionary / XMP) in place.

(add-redaction [^DocumentEditor ed page region])

Mark a rectangular BBox region on a zero-indexed page for redaction.

(apply-redactions [^DocumentEditor ed])

Apply all pending redactions destructively, removing the underlying content.

(editor-save ^bytes [^DocumentEditor ed])

Serialize the edited document to a byte array.

(with-open [ed (pdf/editor pdf-bytes)]
  (pdf/scrub-metadata ed)
  (pdf/add-redaction ed 0 (BBox. 10.0 10.0 50.0 20.0))
  (pdf/apply-redactions ed)
  (pdf/editor-save ed))

AutoExtractor — Auto Extraction

A convenience extractor that picks an extraction strategy automatically for a PdfDocument.

(auto-extractor ^AutoExtractor [^PdfDocument doc])

Create an AutoExtractor for the given document.

(auto-text [^AutoExtractor ax])

Extract text from the whole document using the auto-selected strategy.

(with-open [d (pdf/open pdf-bytes)]
  (pdf/auto-text (pdf/auto-extractor d)))

Lifecycle

The handle types are AutoCloseable; prefer with-open for deterministic cleanup. These functions are escape hatches for non-with-open usage.

(close [resource])

Close any handle (Pdf, PdfDocument, PdfPage, DocumentEditor, AutoExtractor).

(open? [resource])

Return whether the handle is still open.

(let [d (pdf/open pdf-bytes)]
  (pdf/open? d)        ; => true
  (pdf/close d)
  (pdf/open? d))       ; => false

Complete Example

(require '[pdf-oxide.core :as pdf])
(import '[fyi.oxide.pdf.geometry BBox])

;; --- Creation + Extraction ---
(with-open [p (pdf/from-markdown "# Report\n\nGenerated by PDF Oxide.\n")
            d (pdf/open (pdf/save p))]
  (println "Pages:" (pdf/page-count d))
  (println (pdf/extract-text d 0))
  (println (pdf/to-markdown d))
  (println (pdf/to-html d 0))

  ;; Page elements (List -> vector)
  (let [pg (pdf/page d 0)]
    (println "Words:" (count (pdf/words pg)))
    (doseq [w (pdf/words pg)] (print (.text w) "")))

  ;; Search
  (doseq [m (pdf/search d "Report")]
    (println "Match:" (.text m)))

  ;; Metadata (Optional -> nil)
  (println "Producer:" (or (pdf/producer d) "(none)"))

  ;; Render
  (spit "page0.png" (pdf/render d 0 150)))

;; --- Editing + Redaction ---
(with-open [ed (pdf/editor pdf-bytes)]
  (pdf/scrub-metadata ed)
  (pdf/add-redaction ed 0 (BBox. 10.0 10.0 50.0 20.0))
  (pdf/apply-redactions ed)
  (spit "redacted.pdf" (pdf/editor-save ed)))

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Objective-C, and Elixir.

Next Steps