Getting Started with PDF Oxide (C# / .NET)
PDF Oxide is the fastest .NET PDF library — 0.8ms mean text extraction, 5× faster than PyMuPDF, 15× faster than pypdf, 100% pass rate on 3,830 PDFs. One package for extracting, creating, and editing PDFs. NativeAOT-ready, trim-safe, with idiomatic using, async Task<T>, CancellationToken, and LINQ-friendly collections. MIT / Apache-2.0 licensed.
Installation
dotnet add package PdfOxide
Target frameworks: net8.0 and net10.0. IsAotCompatible=true and IsTrimmable=true are enabled.
Pre-built native libraries ship in the NuGet package for Windows, macOS (Intel + Apple Silicon), and Linux (x64 + ARM64). No system dependencies, no Rust toolchain required.
Opening a PDF
using PdfOxide.Core;
using var doc = PdfDocument.Open("research-paper.pdf");
Console.WriteLine($"Pages: {doc.PageCount}");
Console.WriteLine($"PDF version: {doc.Version.Major}.{doc.Version.Minor}");
From a stream:
using var stream = File.OpenRead("report.pdf");
using var doc = PdfDocument.Open(stream);
With a password:
using var doc = PdfDocument.OpenWithPassword("secure.pdf", "user-password");
AES-256 (V=5, R=6) PDFs are fully supported.
Page API
Since v0.3.34 PdfDocument exposes Pages (an IReadOnlyList<PdfPage>) and an int indexer, so you can iterate with foreach and use LINQ.
using PdfOxide.Core;
using var doc = PdfDocument.Open("paper.pdf");
foreach (var page in doc.Pages)
{
Console.WriteLine($"--- Page {page.Index + 1} ---");
Console.WriteLine(page.ExtractText());
}
// Or index directly
PdfPage first = doc[0];
string md = await first.ToMarkdownAsync();
Each PdfPage has a full sync + async surface: ExtractText() / ExtractTextAsync(), ToMarkdown(), ToHtml(), ToPlainText(), ExtractWords(), ExtractTextLines(), ExtractTables(), ExtractChars(), ExtractImages(), Search().
Text Extraction
Single Page
using var doc = PdfDocument.Open("report.pdf");
string text = doc.ExtractText(0);
Console.WriteLine(text);
All Pages
string allText = doc.ExtractAllText();
Walk Pages Manually
for (int i = 0; i < doc.PageCount; i++)
{
Console.WriteLine($"--- Page {i + 1} ---");
Console.WriteLine(doc.ExtractText(i));
}
Async Extraction
Every extraction method has an *Async counterpart returning Task<T> and accepting an optional CancellationToken.
using PdfOxide.Core;
using var doc = PdfDocument.Open("large.pdf");
string text = await doc.ExtractTextAsync(0);
// Fan-out with cancellation
using var cts = new CancellationTokenSource(TimeSpan.FromSeconds(30));
var tasks = Enumerable.Range(0, doc.PageCount)
.Select(i => doc.ExtractTextAsync(i, cts.Token));
string[] pages = await Task.WhenAll(tasks);
See the async guide for complete patterns.
Structured Extraction
var words = doc.ExtractWords(0);
foreach (var (text, x, y, w, h) in words)
{
Console.WriteLine($"\"{text}\" at ({x:F1}, {y:F1})");
}
// Region-based
string regionText = doc.ExtractTextInRect(0, x: 50, y: 700, width: 200, height: 50);
var tables = doc.ExtractTables(0);
foreach (var (rows, cols) in tables)
{
Console.WriteLine($"{rows}x{cols} table");
}
Markdown Conversion
string markdown = doc.ToMarkdown(0);
string allMarkdown = doc.ToMarkdownAll();
HTML Conversion
string html = doc.ToHtml(0);
string allHtml = doc.ToHtmlAll();
Image Extraction
using PdfOxide.Core;
using var doc = PdfDocument.Open("brochure.pdf");
var images = doc.ExtractImages(0);
foreach (var img in images)
{
Console.WriteLine($"{img.Width}x{img.Height} {img.Format} ({img.Colorspace}, {img.BitsPerComponent} bpc, {img.Data.Length} bytes)");
File.WriteAllBytes($"image_{Array.IndexOf(images.ToArray(), img)}.{img.Format}", img.Data);
}
Indexed-color PDFs are automatically expanded to RGB (1/2/4/8 bpc with RGB, Grayscale, or CMYK base colour spaces).
Search
var results = doc.SearchAll("quarterly revenue");
foreach (var (page, text, x, y, w, h) in results)
{
Console.WriteLine($"Page {page}: \"{text}\" at ({x}, {y})");
}
// Case-sensitive single-page
var pageResults = doc.SearchPage(0, "exact phrase", caseSensitive: true);
LINQ integrates naturally:
var hitsByPage = doc.SearchAll("keyword")
.GroupBy(r => r.Page)
.OrderBy(g => g.Key);
foreach (var group in hitsByPage)
{
Console.WriteLine($"Page {group.Key}: {group.Count()} hits");
}
PDF Creation
using PdfOxide.Core;
// From Markdown
using (var pdf = Pdf.FromMarkdown("# Invoice\n\nTotal: **$42.00**"))
{
pdf.Save("invoice.pdf");
}
// From HTML
using (var pdf = Pdf.FromHtml("<h1>Report</h1><p>Generated 2026-04-09</p>"))
{
pdf.Save("report.pdf");
}
// From plain text
using (var pdf = Pdf.FromText("Plain text document.\n\nSecond paragraph."))
{
pdf.Save("notes.pdf");
}
// From image
using (var pdf = Pdf.FromImage("scan.jpg"))
{
pdf.Save("scan.pdf");
}
Editing — Metadata and Forms
using PdfOxide.Core;
using var editor = DocumentEditor.Open("form.pdf");
// Read metadata
Console.WriteLine($"Title: {editor.Title}");
Console.WriteLine($"Pages: {editor.PageCount}");
// Update metadata
editor.Title = "Quarterly Report";
editor.Author = "Finance Team";
editor.Subject = "Q1 2026 Results";
// Fill and flatten form fields
editor.SetFormFieldValue("employee.name", "Jane Doe");
editor.SetFormFieldValue("employee.email", "jane@example.com");
editor.FlattenForms();
editor.Save("edited.pdf");
// or: await editor.SaveAsync("edited.pdf");
Reading form fields without editing:
using var doc = PdfDocument.Open("form.pdf");
foreach (var f in doc.GetFormFields())
{
Console.WriteLine($"{f.Name} ({f.FieldType}) = \"{f.Value}\"");
}
Note: The .NET binding currently exposes document open / read / convert / create, image extraction, form field read/fill/flatten, and metadata editing. Page operations, annotations, rendering, and signatures are available through the Rust core and other bindings; equivalent .NET surface will be added in a future release.
NativeAOT Publishing
PDF Oxide’s .NET binding is NativeAOT-publish-ready:
dotnet publish -c Release -r linux-x64 --self-contained -p:PublishAot=true
All 881 P/Invoke declarations use LibraryImport (source-generated P/Invoke), IsAotCompatible=true, IsTrimmable=true. Your AOT-compiled binary links only the bits it uses, and the native Rust core is statically linked in the included platform-specific library.
Plugins and Extensions
The PdfOxide.Plugins package (shipped alongside PdfOxide) exposes extension points for processors that transform extracted content — classifiers, post-processors, validators. See the plugin guide for extension authoring.
Error Handling
All methods throw typed exceptions on failure:
using PdfOxide.Core;
try
{
using var doc = PdfDocument.Open("document.pdf");
string text = doc.ExtractText(0);
}
catch (PdfOxideException ex)
{
Console.Error.WriteLine($"PDF Oxide error: {ex.Message}");
}
catch (FileNotFoundException)
{
Console.Error.WriteLine("File not found");
}
Next Steps
- Python Getting Started — using PDF Oxide from Python
- C# API Reference — full API documentation
- Async Guide —
Task<T>+CancellationTokenpatterns - Concurrency Guide —
ReaderWriterLockSlimsharing patterns - Text Extraction — detailed extraction options
- NuGet package — release notes and download stats