Clojure API Reference
PDF Oxide ships idiomatic Clojure bindings as a thin wrapper over the
fyi.oxide:pdf-oxide Java binding, which owns the single JNI native bridge
(the pdf_oxide_jni crate). The wrapper adds zero native code: it calls the
Java classes directly via interop and returns Clojure-friendly values
(java.util.List becomes a vector, java.util.Optional becomes a value or
nil). The handle types (Pdf, PdfDocument, DocumentEditor,
AutoExtractor) are AutoCloseable, so use with-open for deterministic
cleanup.
;; deps.edn
{:deps {fyi.oxide/pdf-oxide-clojure {:mvn/version "0.3.69"}}}
;; Leiningen
[fyi.oxide/pdf-oxide-clojure "0.3.69"]
The JNI native library (libpdf_oxide_jni) is not bundled — make it loadable
via System.loadLibrary("pdf_oxide_jni") on your java.library.path, or point
the Java NativeLoader at it with -Dfyi.oxide.pdf.lib.path=<path>.
Every function lives in the pdf-oxide.core namespace:
(require '[pdf-oxide.core :as pdf])
For other languages, see the Java API Reference, the Python API Reference, the Rust API Reference, and Types & Enums.
Pdf — Creation
Functions that build a new in-memory Pdf from source content, plus
serialization to a byte array. The returned Pdf is AutoCloseable.
Creation
(from-markdown ^Pdf [^String markdown])
Create a Pdf from a Markdown string.
(from-html ^Pdf [^String html])
Create a Pdf from an HTML string.
Saving
(save ^bytes [^Pdf pdf])
Serialize a built Pdf to a byte array (the raw PDF bytes).
(with-open [p (pdf/from-markdown "# Hello\n\nbody\n")]
(pdf/save p)) ; => byte[]
PdfDocument — Opening, Extraction & Rendering
The primary read handle for an existing PDF. Open from a byte array or a
filesystem path, then extract text, convert to Markdown/HTML, render pages,
search, and read metadata and form fields. AutoCloseable.
Opening
(open ^PdfDocument [source])
(open ^PdfDocument [source ^String password])
Open a document from a byte array or a filesystem path string. The two-arity form supplies a password for encrypted PDFs.
(authenticate [^PdfDocument doc ^String password])
Authenticate an encrypted document after opening; returns a boolean.
Document Queries
(page-count [^PdfDocument doc])
Return the number of pages in the document.
(producer [^PdfDocument doc])
Return the /Producer metadata string, or nil if absent.
(creator [^PdfDocument doc])
Return the /Creator metadata string, or nil if absent.
Text Extraction
(extract-text [^PdfDocument doc page])
Extract plain text from a single zero-indexed page.
(extract-structured [^PdfDocument doc page])
Extract structured text (spans/blocks with positioning) for a single page.
Conversion
(to-markdown [^PdfDocument doc])
(to-markdown [^PdfDocument doc page])
Convert the whole document, or a single page, to Markdown.
(to-html [^PdfDocument doc])
(to-html [^PdfDocument doc page])
Convert the whole document, or a single page, to HTML.
Rendering
(render ^bytes [^PdfDocument doc page])
(render ^bytes [^PdfDocument doc page dpi])
Render a page to PNG image bytes, optionally at a given DPI.
Search
(search [^PdfDocument doc ^String query])
Search the document for text; returns a vector of SearchMatch results.
Forms
(form-fields [^PdfDocument doc])
Return a vector of the document’s AcroForm form fields.
Page Access
(page ^PdfPage [^PdfDocument doc idx])
Get a PdfPage handle for the zero-indexed page.
(pages [^PdfDocument doc])
Return a vector of all PdfPage handles in the document.
PdfPage — Page Element Extraction
A page handle obtained from (pdf/page doc idx) or (pdf/pages doc). Each
extraction function converts the Java List result into a Clojure vector.
Elements
(words [^PdfPage page])
Return a vector of word elements on the page.
(lines [^PdfPage page])
Return a vector of line elements on the page.
(chars [^PdfPage page])
Return a vector of per-character glyphs on the page. (This pdf/chars
intentionally shadows clojure.core/chars.)
(tables [^PdfPage page])
Return a vector of detected tables on the page.
(images [^PdfPage page])
Return a vector of image elements on the page.
(annotations [^PdfPage page])
Return a vector of annotations on the page.
Page Text
(page-text [^PdfPage page])
(page-text [^PdfPage page region])
Return the page’s plain text, optionally restricted to a BBox region.
(with-open [d (pdf/open (pdf/save p))]
(let [pg (pdf/page d 0)]
(map #(.text %) (pdf/words pg)) ; word strings
(pdf/page-text pg (BBox. 0.0 0.0 1000.0 1000.0)))) ; region text
DocumentEditor — Editing & Redaction
A mutable editing handle opened independently of PdfDocument. Supports
metadata scrubbing and destructive redaction, then serializes the result to
bytes. AutoCloseable.
(editor ^DocumentEditor [source])
Open a DocumentEditor from a byte array or a filesystem path string.
(scrub-metadata [^DocumentEditor ed])
Remove document metadata (info dictionary / XMP) in place.
(add-redaction [^DocumentEditor ed page region])
Mark a rectangular BBox region on a zero-indexed page for redaction.
(apply-redactions [^DocumentEditor ed])
Apply all pending redactions destructively, removing the underlying content.
(editor-save ^bytes [^DocumentEditor ed])
Serialize the edited document to a byte array.
(with-open [ed (pdf/editor pdf-bytes)]
(pdf/scrub-metadata ed)
(pdf/add-redaction ed 0 (BBox. 10.0 10.0 50.0 20.0))
(pdf/apply-redactions ed)
(pdf/editor-save ed))
AutoExtractor — Auto Extraction
A convenience extractor that picks an extraction strategy automatically for a
PdfDocument.
(auto-extractor ^AutoExtractor [^PdfDocument doc])
Create an AutoExtractor for the given document.
(auto-text [^AutoExtractor ax])
Extract text from the whole document using the auto-selected strategy.
(with-open [d (pdf/open pdf-bytes)]
(pdf/auto-text (pdf/auto-extractor d)))
Lifecycle
The handle types are AutoCloseable; prefer with-open for deterministic
cleanup. These functions are escape hatches for non-with-open usage.
(close [resource])
Close any handle (Pdf, PdfDocument, PdfPage, DocumentEditor,
AutoExtractor).
(open? [resource])
Return whether the handle is still open.
(let [d (pdf/open pdf-bytes)]
(pdf/open? d) ; => true
(pdf/close d)
(pdf/open? d)) ; => false
Complete Example
(require '[pdf-oxide.core :as pdf])
(import '[fyi.oxide.pdf.geometry BBox])
;; --- Creation + Extraction ---
(with-open [p (pdf/from-markdown "# Report\n\nGenerated by PDF Oxide.\n")
d (pdf/open (pdf/save p))]
(println "Pages:" (pdf/page-count d))
(println (pdf/extract-text d 0))
(println (pdf/to-markdown d))
(println (pdf/to-html d 0))
;; Page elements (List -> vector)
(let [pg (pdf/page d 0)]
(println "Words:" (count (pdf/words pg)))
(doseq [w (pdf/words pg)] (print (.text w) "")))
;; Search
(doseq [m (pdf/search d "Report")]
(println "Match:" (.text m)))
;; Metadata (Optional -> nil)
(println "Producer:" (or (pdf/producer d) "(none)"))
;; Render
(spit "page0.png" (pdf/render d 0 150)))
;; --- Editing + Redaction ---
(with-open [ed (pdf/editor pdf-bytes)]
(pdf/scrub-metadata ed)
(pdf/add-redaction ed 0 (BBox. 10.0 10.0 50.0 20.0))
(pdf/apply-redactions ed)
(spit "redacted.pdf" (pdf/editor-save ed)))
Other Language Bindings
PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Objective-C, and Elixir.
Next Steps
- Types & Enums — all shared types and enums
- Page API Reference — consistent per-page iteration across bindings
- Getting Started with Clojure — tutorial