What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Image Extraction

PDF Oxide extracts images from PDF pages by parsing the content stream, resolving XObject references via Do operators, recursing into nested Form XObjects, and decoding inline images. Use extract_images() to get image objects in memory, or extract_images_to_files() to save them directly to disk as PNG or JPEG files.

Since v0.3.5, image extraction processes the full page content stream rather than only scanning the XObject dictionary. This correctly handles images placed via Do operators, nested Form XObjects with cycle detection, and inline images embedded with BI/ID/EI sequences.

Color-space support

Extracted images are decoded and delivered in their original colour space — no lossy round-tripping:

DeviceRGB / DeviceGray / DeviceCMYK — returned as-is.
Indexed (1, 2, 4, 8 bits per component) — palette resolved via resolve_indexed_palette and expanded through expand_indexed_to_rgb. Supports Indexed palettes built on RGB, Grayscale, and CMYK base colour spaces. Previously emitted Invalid RGB image dimensions errors on many real-world PDFs.
CalRGB / CalGray / ICCBased — converted to RGB during decode.

Palette expansion is hardened against malicious inputs with a checked_mul overflow guard and a 256 MiB allocation cap; truncated streams are rejected cleanly instead of producing garbage pixels.

Malformed-image tolerance

Images with missing /ColorSpace entries, zero dimensions, or invalid streams are skipped with a warning — they no longer panic the page render. The same tolerance applies to malformed images nested inside Form XObjects.

Quick Example

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")
images = doc.extract_image_bytes(0)
for img in images:
    print(f"{img['width']}x{img['height']}")

Node.js

const { PdfDocument } = require("pdf-oxide");

const doc = new PdfDocument("report.pdf");
const images = doc.getEmbeddedImages(0);
for (const img of images) {
    console.log(`${img.width}x${img.height}`);
}

import pdfoxide "github.com/yfedoseev/pdf_oxide/go"

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()
images, _ := doc.Images(0)
for _, img := range images {
    fmt.Printf("%dx%d\n", img.Width, img.Height)
}

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
var images = doc.ExtractImages(0);
foreach (var img in images)
{
    Console.WriteLine($"{img.Width}x{img.Height}");
}

WASM

const doc = new WasmPdfDocument(bytes);
const images = doc.extractImages(0);
for (const img of images) {
    console.log(`${img.width}x${img.height}`);
}

Rust

use pdf_oxide::PdfDocument;

let mut doc = PdfDocument::open("report.pdf")?;
let images = doc.extract_images(0)?;
for img in &images {
    println!("{}x{} {:?}", img.width(), img.height(), img.color_space());
}

Java

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.image.ExtractedImage;
import java.nio.file.Path;
import java.util.List;

try (PdfDocument doc = PdfDocument.open(Path.of("report.pdf"))) {
    List<ExtractedImage> images = doc.page(0).images();
    for (ExtractedImage img : images) {
        System.out.println(img.width() + "x" + img.height());
    }
}

Kotlin

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(java.nio.file.Path.of("report.pdf")).use { doc ->
    for (img in doc.page(0).images()) {
        println("${img.width()}x${img.height()}")
    }
}

Scala

import fyi.oxide.pdf.{PdfDocument, imagesSeq}
import scala.util.Using

Using.resource(PdfDocument.open("report.pdf")) { doc =>
  for (img <- doc.page(0).imagesSeq) {
    println(s"${img.width}x${img.height}")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [doc (pdf/open "report.pdf")]
  (doseq [img (pdf/images (pdf/page doc 0))]
    (println (str (.width img) "x" (.height img)))))

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");
for (const auto& img : doc.embedded_images(0)) {
    std::printf("%dx%d\n", img.width, img.height);
}

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
for img in try doc.embeddedImages(0) {
    print("\(img.width)x\(img.height)")
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
for (final img in doc.embeddedImages(0)) {
    print('${img.width}x${img.height}');
}

library(pdfoxide)

doc <- pdf_open("report.pdf")
for (img in pdf_embedded_images(doc, 0)) {
    cat(sprintf("%dx%d\n", img$width, img$height))
}

Julia

using PdfOxide

doc = open_document("report.pdf")
for img in embedded_images(doc, 0)
    println("$(img.width)x$(img.height)")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("report.pdf");
const images = try doc.embeddedImages(a, 0);
for (images) |img| {
    std.debug.print("{d}x{d}\n", .{ img.width, img.height });
}

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
for (POXImage *img in [doc embeddedImages:0 error:&err]) {
    NSLog(@"%ldx%ld", (long)img.width, (long)img.height);
}

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, images} = PdfOxide.embedded_images(doc, 0)
for img <- images do
  IO.puts("#{img.width}x#{img.height}")
end

API Reference

`extract_images(page_index) -> Vec<PdfImage>`

Extract all images from a page. Parses the page content stream to find:

XObject images referenced via Do operators
Form XObjects containing nested images (recursive, with cycle detection)
Inline images embedded with BI/ID/EI sequences

CTM (Current Transformation Matrix) tracking provides bounding boxes for each image.

Parameter	Type	Description
`page_index`	`int` / `usize`	Zero-based page index

Returns: A vector of PdfImage objects.

PdfImage Fields and Methods

Method / Field	Type	Description
`width()`	`u32`	Image width in pixels
`height()`	`u32`	Image height in pixels
`color_space()`	`&ColorSpace`	Color space (DeviceRGB, DeviceGray, DeviceCMYK, etc.)
`bits_per_component()`	`u8`	Bits per color component (typically 8)
`data()`	`&ImageData`	Raw image data (JPEG bytes or raw pixels)
`bbox()`	`Option<&Rect>`	Bounding box in PDF user space (if CTM was tracked)
`save_as_png(path)`	`Result<()>`	Save image as PNG file
`save_as_jpeg(path)`	`Result<()>`	Save image as JPEG file
`to_png_bytes()`	`Result<Vec<u8>>`	Encode as PNG bytes in memory
`to_jpeg_bytes()`	`Result<Vec<u8>>`	Encode as JPEG bytes in memory

ColorSpace Variants

Variant	Description
`DeviceRGB`	3-channel RGB
`DeviceGray`	Single-channel grayscale
`DeviceCMYK`	4-channel CMYK
`Indexed`	Palette-based color
`ICCBased`	ICC profile-based color
`CalGray`	Calibrated grayscale
`CalRGB`	Calibrated RGB
`Lab`	CIE Lab* color

ImageData Variants

Variant	Description
`Jpeg(Vec<u8>)`	JPEG-compressed data (DCT pass-through)
`Raw { pixels, format }`	Decoded pixel data with `PixelFormat` (RGB, Gray, CMYK, RGBA)

Rust

let mut doc = PdfDocument::open("report.pdf")?;
let images = doc.extract_images(0)?;

for (i, image) in images.iter().enumerate() {
    println!(
        "Image {}: {}x{} {:?} {}bpc",
        i, image.width(), image.height(),
        image.color_space(), image.bits_per_component(),
    );

    if let Some(bbox) = image.bbox() {
        println!("  Position: ({:.1}, {:.1})", bbox.x, bbox.y);
    }

    image.save_as_png(&format!("output/image_{}.png", i))?;
}

`extract_images_to_files(page_index, output_dir, prefix, start_index) -> Vec<ExtractedImageRef>`

Extract images from a page and save them directly to files. JPEG images are saved in their original format (zero re-encoding loss); other images are saved as PNG.

Parameter	Type	Default	Description
`page_index`	`usize`	–	Zero-based page index
`output_dir`	`impl AsRef<Path>`	–	Directory to save images (created if absent)
`prefix`	`Option<&str>`	`"img"`	Filename prefix
`start_index`	`Option<usize>`	`1`	Starting index for filenames

Returns: A vector of ExtractedImageRef describing saved files.

ExtractedImageRef Fields

Field	Type	Description
`filename`	`String`	Saved filename (e.g., `"img_001.png"`)
`format`	`ImageFormat`	`Png` or `Jpeg`
`width`	`u32`	Image width in pixels
`height`	`u32`	Image height in pixels

Rust

let mut doc = PdfDocument::open("report.pdf")?;
let refs = doc.extract_images_to_files(0, "output/images", Some("fig"), Some(1))?;

for img_ref in &refs {
    println!("Saved: {} ({}x{}, {:?})", img_ref.filename, img_ref.width, img_ref.height, img_ref.format);
}

Advanced Examples

Extract all images from all pages

use pdf_oxide::PdfDocument;
use std::path::Path;

let mut doc = PdfDocument::open("book.pdf")?;
let page_count = doc.page_count()?;
let mut total = 0;

for page in 0..page_count {
    let refs = doc.extract_images_to_files(
        page,
        "output/images",
        Some(&format!("page{}", page + 1)),
        Some(1),
    )?;
    total += refs.len();
    println!("Page {}: {} images", page + 1, refs.len());
}
println!("Total: {} images extracted", total);

Get image bytes in memory (no disk I/O)

let mut doc = PdfDocument::open("report.pdf")?;
let images = doc.extract_images(0)?;

for image in &images {
    let png_bytes = image.to_png_bytes()?;
    println!("PNG size: {} bytes", png_bytes.len());

    // Use png_bytes with an HTTP response, database, etc.
}

Filter images by size

let mut doc = PdfDocument::open("report.pdf")?;
let images = doc.extract_images(0)?;

// Only keep images larger than 100x100 pixels
let large_images: Vec<_> = images.iter()
    .filter(|img| img.width() > 100 && img.height() > 100)
    .collect();

println!("{} large images on page 1", large_images.len());
for img in &large_images {
    println!("  {}x{} {:?}", img.width(), img.height(), img.color_space());
}

Distinguish JPEG pass-through from re-encoded images

use pdf_oxide::extractors::ImageData;

let mut doc = PdfDocument::open("report.pdf")?;
let images = doc.extract_images(0)?;

for (i, image) in images.iter().enumerate() {
    match image.data() {
        ImageData::Jpeg(bytes) => {
            // Original JPEG data -- save directly for zero quality loss
            std::fs::write(format!("image_{}.jpg", i), bytes)?;
            println!("Image {}: JPEG pass-through ({} bytes)", i, bytes.len());
        }
        ImageData::Raw { pixels, format } => {
            // Raw pixels -- must encode to a file format
            image.save_as_png(&format!("image_{}.png", i))?;
            println!("Image {}: raw {:?} ({}x{})", i, format, image.width(), image.height());
        }
    }
}

The embedded-images accessor (`embedded_images`)

extract_images() is the rich, in-memory Rust API. The cross-language bindings expose a leaner embedded-images accessor built on the same content-stream walk, returning each image’s pixel dimensions, format, color space, bits-per-component, and raw decoded bytes. It is backed by the C ABI function pdf_document_get_embedded_images plus the pdf_oxide_image_* accessor family.

How do I list embedded images with the bindings?

import (
    "fmt"
    pdfoxide "github.com/yfedoseev/pdf_oxide/go"
)

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()

images, _ := doc.Images(0) // []pdfoxide.Image
for _, img := range images {
    fmt.Printf("%dx%d %s/%s %dbpc, %d bytes\n",
        img.Width, img.Height, img.Format, img.Colorspace,
        img.BitsPerComponent, len(img.Data))
}

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let images = try doc.embeddedImages(0) // [Image]
for img in images {
    print("\(img.width)x\(img.height) \(img.format)/\(img.colorspace) "
        + "\(img.bitsPerComponent)bpc, \(img.data.count) bytes")
}

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiImageList *images = pdf_document_get_embedded_images(doc, /*page=*/0, &err);
int32_t n = pdf_oxide_image_count(images);
for (int32_t i = 0; i < n; i++) {
    int32_t w = pdf_oxide_image_get_width(images, i, &err);
    int32_t h = pdf_oxide_image_get_height(images, i, &err);
    char *fmt = pdf_oxide_image_get_format(images, i, &err);
    char *cs  = pdf_oxide_image_get_colorspace(images, i, &err);
    printf("%dx%d %s/%s\n", w, h, fmt, cs);
    free_string(fmt);
    free_string(cs);
}
pdf_oxide_image_list_free(images);

Java

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.image.ExtractedImage;
import java.nio.file.Path;

try (PdfDocument doc = PdfDocument.open(Path.of("report.pdf"))) {
    for (ExtractedImage img : doc.page(0).images()) {
        System.out.printf("%dx%d %s, %d bytes%n",
            img.width(), img.height(), img.format(), img.bytes().length);
    }
}

Kotlin

import fyi.oxide.pdf.PdfDocument

PdfDocument.open(java.nio.file.Path.of("report.pdf")).use { doc ->
    for (img in doc.page(0).images()) {
        println("${img.width()}x${img.height()} ${img.format()}, ${img.bytes().size} bytes")
    }
}

Scala

import fyi.oxide.pdf.{PdfDocument, imagesSeq}
import scala.util.Using

Using.resource(PdfDocument.open("report.pdf")) { doc =>
  for (img <- doc.page(0).imagesSeq) {
    println(s"${img.width}x${img.height} ${img.format}, ${img.bytes.length} bytes")
  }
}

Clojure

(require '[pdf-oxide.core :as pdf])

(with-open [doc (pdf/open "report.pdf")]
  (doseq [img (pdf/images (pdf/page doc 0))]
    (println (format "%dx%d %s, %d bytes"
                     (.width img) (.height img) (.format img) (count (.bytes img))))))

C++

#include <pdf_oxide/pdf_oxide.hpp>

auto doc = pdf_oxide::Document::open("report.pdf");
for (const auto& img : doc.embedded_images(0)) {
    std::printf("%dx%d %s/%s %dbpc, %zu bytes\n",
        img.width, img.height, img.format.c_str(), img.colorspace.c_str(),
        img.bits_per_component, img.data.size());
}

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
for (final img in doc.embeddedImages(0)) {
    print('${img.width}x${img.height} ${img.format}/${img.colorspace} '
        '${img.bitsPerComponent}bpc, ${img.data.length} bytes');
}

library(pdfoxide)

doc <- pdf_open("report.pdf")
for (img in pdf_embedded_images(doc, 0)) {
    cat(sprintf("%dx%d %s/%s %dbpc, %d bytes\n",
        img$width, img$height, img$format, img$colorspace,
        img$bits_per_component, length(img$data)))
}

Julia

using PdfOxide

doc = open_document("report.pdf")
for img in embedded_images(doc, 0)
    println("$(img.width)x$(img.height) $(img.format)/$(img.colorspace) " *
            "$(img.bitsPerComponent)bpc, $(length(img.data)) bytes")
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("report.pdf");
const images = try doc.embeddedImages(a, 0);
for (images) |img| {
    std.debug.print("{d}x{d} {s}/{s} {d}bpc, {d} bytes\n", .{
        img.width, img.height, img.format, img.colorspace,
        img.bits_per_component, img.data.len,
    });
}

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
for (POXImage *img in [doc embeddedImages:0 error:&err]) {
    NSLog(@"%ldx%ld %@/%@ %ldbpc, %lu bytes",
        (long)img.width, (long)img.height, img.format, img.colorspace,
        (long)img.bitsPerComponent, (unsigned long)img.data.length);
}

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, images} = PdfOxide.embedded_images(doc, 0)
for img <- images do
  IO.puts("#{img.width}x#{img.height} #{img.format}/#{img.colorspace} " <>
          "#{img.bits_per_component}bpc, #{byte_size(img.data)} bytes")
end

Image accessor fields

Field (Go / Swift)	Type	Description
`Width` / `width`	`int`	Image width in pixels
`Height` / `height`	`int`	Image height in pixels
`Format` / `format`	`string`	Source format string (e.g. `"jpeg"`, `"raw"`)
`Colorspace` / `colorspace`	`string`	Color space name (e.g. `"DeviceRGB"`)
`BitsPerComponent` / `bitsPerComponent`	`int`	Bits per color component
`Data` / `data`	`[]byte` / `[UInt8]`	Raw decoded image bytes

Binding coverage. The embedded-images accessor is exposed in Go (doc.Images(page)), Swift (doc.embeddedImages(page)), and the C ABI (pdf_document_get_embedded_images). In Rust, use the richer extract_images() shown above. The accessor is compiled out of the WASM target.

The page-elements accessor (`page_elements`)

page_elements returns every laid-out element (text spans, with their type, text, and bounding box) on a page as a single list. The bindings marshal the whole list in one FFI call via pdf_oxide_elements_to_json, so it is the cheapest way to walk a page’s layout without re-running text extraction per region. It is backed by the C ABI function pdf_page_get_elements and the pdf_oxide_element_* accessor family.

How do I walk a page’s layout elements?

import (
    "fmt"
    pdfoxide "github.com/yfedoseev/pdf_oxide/go"
)

doc, _ := pdfoxide.Open("report.pdf")
defer doc.Close()

elements, _ := doc.PageElements(0) // []pdfoxide.Element
for _, el := range elements {
    fmt.Printf("[%s] %q at (%.1f, %.1f) %.1fx%.1f\n",
        el.Type, el.Text, el.X, el.Y, el.Width, el.Height)
}

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let elements = try doc.pageElements(0) // ElementList
for el in try elements.all() {
    print("[\(el.type)] \(el.text) at "
        + "(\(el.rect.x), \(el.rect.y)) \(el.rect.width)x\(el.rect.height)")
}

// Serialize the whole list to JSON in one call:
let json = try elements.toJson()

C ABI

#include "pdf_oxide.h"

int32_t err = 0;
FfiElementList *els = pdf_page_get_elements(doc, /*page=*/0, &err);

// One-shot JSON serialization (caller frees with free_string):
char *json = pdf_oxide_elements_to_json(els, &err);
printf("%s\n", json);
free_string(json);

pdf_oxide_elements_free(els);

Dart

import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
final elements = doc.pageElements(0); // ElementList
for (final el in elements.toList()) {
    print('[${el.type}] ${el.text} at '
        '(${el.rect.x}, ${el.rect.y}) ${el.rect.width}x${el.rect.height}');
}

// Serialize the whole list to JSON in one call:
final json = elements.toJson();

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
POXElementList *els = [doc pageElements:0 error:&err];
for (int32_t i = 0; i < [els count]; i++) {
    NSString *type = [els typeAtIndex:i error:&err];
    NSString *text = [els textAtIndex:i error:&err];
    POXBbox rect = [els rectAtIndex:i error:&err];
    NSLog(@"[%@] %@ at (%.1f, %.1f) %.1fx%.1f",
        type, text, rect.x, rect.y, rect.width, rect.height);
}

// One-shot JSON serialization:
NSString *json = [els toJsonWithError:&err];

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, els} = PdfOxide.page_elements(doc, 0)
for i <- 0..(PdfOxide.element_count(els) - 1) do
  {:ok, type} = PdfOxide.element_type(els, i)
  {:ok, text} = PdfOxide.element_text(els, i)
  {:ok, rect} = PdfOxide.element_rect(els, i)
  IO.puts("[#{type}] #{text} at (#{rect.x}, #{rect.y}) #{rect.width}x#{rect.height}")
end

# Serialize the whole list to JSON in one call:
{:ok, json} = PdfOxide.elements_to_json(els)

Element fields

Field (Go / Swift)	Type	Description
`Type` / `type`	`string`	Element type (e.g. `"text"`)
`Text` / `text`	`string`	Element text content
`X`, `Y` / `rect.x`, `rect.y`	`float`	Bounding-box origin in PDF user space
`Width`, `Height` / `rect.width`, `rect.height`	`float`	Bounding-box size

Binding coverage. page_elements is exposed in Go (doc.PageElements(page)), Swift (doc.pageElements(page) → ElementList), and the C ABI (pdf_page_get_elements + pdf_oxide_elements_to_json). It is compiled out of the WASM target.

FAQ

What is the difference between extract_images() and the embedded-images accessor? extract_images() (Rust) returns rich PdfImage objects with save_as_png, to_jpeg_bytes, CTM bounding boxes, and typed ColorSpace/ImageData enums. The embedded-images accessor (doc.Images / doc.embeddedImages / pdf_document_get_embedded_images) returns a flat list of dimensions, format, color space, and raw bytes — the cross-language path to the same content-stream walk.

Is image extraction fast? Yes. PDF Oxide’s extraction core runs at roughly 0.8 ms mean / 9 ms p99 with a 100% pass rate on the benchmark corpus, decoding images in their original color space with no lossy round-tripping.

Does the embedded-images accessor re-encode JPEGs? No. JPEG-backed images are returned with their original DCT bytes (format == "jpeg"); only raw pixel data is decoded. The richer extract_images() API exposes the same distinction via ImageData::Jpeg vs ImageData::Raw.

Why is data empty for some images? Malformed images (missing /ColorSpace, zero dimensions, truncated streams) are skipped with a warning rather than panicking the page, so their byte buffer may come back empty.

Text Extraction – Extract text alongside images
HTML Conversion – Embed extracted images in HTML output
Markdown Conversion – Include images in Markdown output
Metadata & XMP – Read embedded fonts and document producer

Image Extraction

Color-space support

Malformed-image tolerance

Quick Example

API Reference

extract_images(page_index) -> Vec<PdfImage>

PdfImage Fields and Methods

ColorSpace Variants

ImageData Variants

extract_images_to_files(page_index, output_dir, prefix, start_index) -> Vec<ExtractedImageRef>

ExtractedImageRef Fields

Advanced Examples

Extract all images from all pages

Get image bytes in memory (no disk I/O)

Filter images by size

Distinguish JPEG pass-through from re-encoded images

The embedded-images accessor (embedded_images)

How do I list embedded images with the bindings?

Image accessor fields

The page-elements accessor (page_elements)

How do I walk a page’s layout elements?

Element fields

FAQ

Related Pages

`extract_images(page_index) -> Vec<PdfImage>`

`extract_images_to_files(page_index, output_dir, prefix, start_index) -> Vec<ExtractedImageRef>`

The embedded-images accessor (`embedded_images`)

The page-elements accessor (`page_elements`)