Skip to content

Concurrency — Thread-Safe PDF Reads

PdfDocument has been Send + Sync on the Rust side since v0.3.22. A single document can be shared across OS threads, goroutines, worker threads, or asyncio tasks for parallel page extraction. Write operations still need serialisation — that’s what DocumentEditor is for.

What changed in v0.3.22

All 16 RefCell<T> wrappers inside PdfDocument were replaced with Mutex<T>, and Cell<usize> became AtomicUsize. The language bindings dropped the unsendable marker on Python classes (PdfDocument, PdfPage, FormField), which previously raised RuntimeError the moment they crossed a thread boundary.

Net effect: thread pools, async runtimes, and free-threaded Python all now just work.

Rust

Rust

use pdf_oxide::PdfDocument;
use std::sync::Arc;
use std::thread;

let doc = Arc::new(PdfDocument::open("report.pdf")?);
let page_count = doc.page_count();

let handles: Vec<_> = (0..page_count)
    .map(|i| {
        let doc = Arc::clone(&doc);
        thread::spawn(move || doc.extract_text(i))
    })
    .collect();

for h in handles {
    let text = h.join().unwrap()?;
    println!("{}", text);
}

With tokio:

Rust

use std::sync::Arc;
use tokio::task;

let doc = Arc::new(pdf_oxide::PdfDocument::open("report.pdf")?);

let tasks: Vec<_> = (0..doc.page_count())
    .map(|i| {
        let doc = Arc::clone(&doc);
        task::spawn_blocking(move || doc.extract_text(i))
    })
    .collect();

for t in tasks {
    let text = t.await??;
}

Python

Python

from concurrent.futures import ThreadPoolExecutor
from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

with ThreadPoolExecutor(max_workers=8) as pool:
    pages = list(pool.map(doc.extract_text, range(doc.page_count())))

Under stock CPython the GIL still serialises Python-level work, but the extraction itself releases the GIL during Rust execution — so this is genuinely parallel on the Rust side. Under cp314t (free-threaded Python 3.14+), the GIL is optional and the bindings declare gil_used = false so there is no implicit serialisation at all.

With asyncio:

Python

import asyncio
from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

async def main():
    pages = await asyncio.gather(
        *[asyncio.to_thread(doc.extract_text, i) for i in range(doc.page_count())]
    )

Or use the ready-made AsyncPdfDocument from the async guide.

Go

Reads on *PdfDocument are protected by an internal sync.RWMutex — goroutine-safe by construction.

Go

package main

import (
    "sync"

    pdfoxide "github.com/yfedoseev/pdf_oxide/go"
)

func main() {
    doc, _ := pdfoxide.Open("report.pdf")
    defer doc.Close()

    count, _ := doc.PageCount()
    results := make([]string, count)

    var wg sync.WaitGroup
    for i := 0; i < count; i++ {
        wg.Add(1)
        go func(page int) {
            defer wg.Done()
            text, _ := doc.ExtractText(page)
            results[page] = text
        }(i)
    }
    wg.Wait()
}

*DocumentEditor serialises writes internally, but do not pipeline independent edits from multiple goroutines — collect mutations on one goroutine.

C#

C#

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
var tasks = Enumerable.Range(0, doc.PageCount)
    .Select(i => Task.Run(() => doc.ExtractText(i)));
string[] pages = await Task.WhenAll(tasks);

If you need fine-grained reader/writer semantics around a DocumentEditor:

C#

var locker = new ReaderWriterLockSlim();

locker.EnterReadLock();
try
{
    string text = doc.ExtractText(0);
}
finally
{
    locker.ExitReadLock();
}

Node.js

A PdfDocument can be passed to worker threads by transferring the backing handle. The simpler pattern is to let the *Async methods do the dispatching:

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("report.pdf");
try {
  const pageCount = doc.getPageCount();
  const pages = await Promise.all(
    Array.from({ length: pageCount }, (_, i) => doc.extractTextAsync(i))
  );
} finally {
  doc.close();
}

Each *Async call runs on the libuv thread pool.

Writer serialisation

Writes (DocumentEditor, Pdf, PdfCreator) are not lock-free. If multiple threads need to modify the same document, funnel mutations through one writer goroutine / task and fan out the reads.

A common pattern:

  • 1 reader PdfDocument shared across N reader threads.
  • 1 writer DocumentEditor owned by a single coordinator task that collects edits from a channel or queue.