Getting Started with PDF Oxide (Elixir)
PDF Oxide is the fastest way to read and write PDFs from Elixir — 0.8ms mean text extraction, 100% pass rate on 3,830 PDFs. It’s a NIF over the same Rust core, running CPU-bound work on dirty CPU schedulers (ERL_NIF_DIRTY_JOB_CPU_BOUND) so it never blocks the BEAM scheduler.
Document and Pdf handles are NIF resources freed by the GC. Fallible functions return {:ok, value} or {:error, code}, and page indices are 0-based.
Installation
Add pdf_oxide to your mix.exs dependencies:
def deps do
[
{:pdf_oxide, "~> 0.3"}
]
end
Then fetch and compile — the NIF is built via elixir_make against the native cdylib:
mix deps.get
mix compile
Quick Start
Build a PDF from Markdown, serialize it to bytes, then open it and extract the text back out.
{:ok, pdf} = PdfOxide.from_markdown("# Hello pdf_oxide\n\nThis is an **Elixir** binding.\n")
{:ok, bytes} = PdfOxide.to_bytes(pdf)
{:ok, doc} = PdfOxide.open_from_bytes(bytes)
{:ok, pages} = PdfOxide.page_count(doc)
IO.puts("pages: #{pages}")
%{major: maj, minor: min} = PdfOxide.version(doc)
IO.puts("version: #{maj}.#{min}")
{:ok, text} = PdfOxide.extract_text(doc, 0)
IO.puts(text)
Opening a PDF
Open from a file path, or directly from in-memory bytes (useful when streaming from S3, HTTP, or a database):
# From a path
{:ok, doc} = PdfOxide.open("report.pdf")
# From bytes already in memory
{:ok, doc} = PdfOxide.open_from_bytes(pdf_bytes)
# Encrypted documents
{:ok, doc} = PdfOxide.open_with_password("confidential.pdf", "secret")
# Inspect
{:ok, count} = PdfOxide.page_count(doc)
encrypted? = PdfOxide.encrypted?(doc)
Close a document explicitly when you’re done (close/1 is idempotent), or let the GC reclaim it:
:ok = PdfOxide.close(doc)
Text Extraction
Extract plain text from a single page by its zero-based index, or pull the whole document at once:
{:ok, doc} = PdfOxide.open("book.pdf")
# A single page
{:ok, text} = PdfOxide.extract_text(doc, 0)
# Plain text, one page
{:ok, pt} = PdfOxide.to_plain_text(doc, 0)
# Every page, concatenated
{:ok, all} = PdfOxide.to_plain_text_all(doc)
IO.puts(all)
Markdown & HTML Conversion
Convert a page — or the entire document — to Markdown or HTML:
{:ok, doc} = PdfOxide.open("paper.pdf")
{:ok, md} = PdfOxide.to_markdown(doc, 0)
{:ok, mdall} = PdfOxide.to_markdown_all(doc)
{:ok, html} = PdfOxide.to_html(doc, 0)
{:ok, htmlall} = PdfOxide.to_html_all(doc)
Words & Lines
extract_words/2 returns structured PdfOxide.Word structs with a bounding box and a bold flag; extract_text_lines/2 groups them into lines.
{:ok, doc} = PdfOxide.open("paper.pdf")
{:ok, words} = PdfOxide.extract_words(doc, 0)
for w <- Enum.take(words, 10) do
%PdfOxide.Bbox{x: x, y: y, width: width} = w.bbox
IO.puts("#{w.text} at (#{x}, #{y}) w=#{width} bold=#{w.bold}")
end
{:ok, lines} = PdfOxide.extract_text_lines(doc, 0)
for line <- lines do
IO.puts("#{line.word_count} words: #{line.text}")
end
Search
Search a single page, or across the whole document. The fourth argument is case_sensitive. Each result carries text, page, and a PdfOxide.Bbox.
{:ok, doc} = PdfOxide.open("manual.pdf")
# One page (page index 0), case-insensitive
{:ok, results} = PdfOxide.search(doc, 0, "configuration", false)
for r <- results do
%PdfOxide.Bbox{x: x, y: y} = r.bbox
IO.puts("page #{r.page}: '#{r.text}' at (#{x}, #{y})")
end
# All pages
{:ok, all} = PdfOxide.search_all(doc, "configuration", false)
IO.puts("#{length(all)} matches")
PDF Creation
The builder factory functions return a Pdf handle that you serialize with to_bytes/1 or write straight to disk with save/2:
{:ok, pdf} = PdfOxide.from_markdown("# Hello World\n\nThis is a PDF.")
:ok = PdfOxide.save(pdf, "output.pdf")
{:ok, pdf} = PdfOxide.from_html("<h1>Invoice</h1><p>Amount: $42</p>")
{:ok, bytes} = PdfOxide.to_bytes(pdf)
{:ok, pdf} = PdfOxide.from_text("Plain text content.")
:ok = PdfOxide.save(pdf, "notes.pdf")
Rendering Pages to Images
With the rendering feature, rasterize a page to a PdfOxide.RenderedImage and save it as a PNG:
{:ok, doc} = PdfOxide.open("paper.pdf")
{:ok, img} = PdfOxide.render_page(doc, 0)
IO.puts("#{img.width}x#{img.height}, #{byte_size(img.data)} bytes")
:ok = PdfOxide.save(img, "page0.png")
# Zoom factor, or a fixed-size thumbnail
{:ok, zoomed} = PdfOxide.render_page_zoom(doc, 0, 2.0)
{:ok, thumb} = PdfOxide.render_page_thumbnail(doc, 0, 128)
Error Handling
Fallible functions return a tagged tuple — pattern-match for clean control flow:
case PdfOxide.open("/nonexistent/nope.pdf") do
{:ok, doc} ->
{:ok, text} = PdfOxide.extract_text(doc, 0)
IO.puts(text)
{:error, code} ->
IO.puts("could not open PDF: #{inspect(code)}")
end
Next Steps
- Rust Getting Started — using PDF Oxide from Rust
- Python Getting Started — using PDF Oxide from Python
- Text Extraction — detailed extraction options and recipes
- PDF Creation — advanced creation with metadata and encryption
- Editing — modifying existing PDFs, annotations, and form fields