What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

R API Reference

PDF Oxide ships idiomatic R bindings as the pdfoxide package. The package wraps the pdf_oxide C ABI through R’s native .Call interface, so there is no Java, Python, or external runtime dependency – just a compiled shared library.

# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide

library(pdfoxide)

For the Rust API, see the Rust API Reference. For Python, see the Python API Reference. For JavaScript, see the Node.js API Reference or WASM API Reference.

The R API is a flat functional API built around opaque handle objects rather than R6/S4 classes. The main handle types are:

Handle	Created by	Purpose
`pdfoxide_pdf`	`pdf_from_markdown()`, `pdf_from_html()`, …	A freshly built PDF, ready to save or convert
`pdfoxide_document`	`pdf_open()`, `pdf_open_from_bytes()`	A loaded PDF for read-only extraction and rendering
`pdfoxide_editor`	`pdf_editor_open()`	A mutable PDF for editing, merging, and saving
`pdfoxide_builder`	`pdf_builder_create()`	A `DocumentBuilder` for programmatic page construction
`pdfoxide_page` (builder)	`pdf_builder_page()`, `pdf_builder_a4_page()`, …	A fluent page being laid out
`pdfoxide_page` (lazy)	`pdf_page()`	A lazy read handle over a single document page
`pdfoxide_renderer`, `pdfoxide_rendered_image`	rendering functions	Reusable renderer and rendered raster output
`pdfoxide_certificate`, `pdfoxide_signature`, `pdfoxide_timestamp`, `pdfoxide_tsa_client`, `pdfoxide_dss`	signing / verification functions	Digital-signature primitives

All page indices are 0-based. Functions that mutate a builder/page/editor return the handle invisibly so calls can be chained with the pipe (|>). Handles are closed automatically on garbage collection, but pdf_close() / *_close() free them eagerly.

Creating PDFs

Quick one-shot creation from a source format. Each returns a pdfoxide_pdf handle.

pdf_from_markdown(markdown)                              # build a PDF from a Markdown string
pdf_from_html(html)                                      # build a PDF from an HTML string
pdf_from_text(text)                                      # build a PDF from plain text
pdf_from_image(path)                                     # build a single-page PDF from an image file
pdf_from_image_bytes(bytes)                              # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL)          # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes)  # HTML + CSS with multiple named font families
pdf_merge(paths)                                         # merge several PDF files into one new PDF

Saving / serializing a built PDF

pdf_save(pdf, path)            # write the PDF to a file path
pdf_to_bytes(pdf)             # serialize the PDF to a raw vector
pdf_get_page_count(pdf)       # number of pages in a built pdfoxide_pdf

Opening documents

Open an existing PDF for extraction and rendering. Returns a pdfoxide_document.

pdf_open(path)                          # open a PDF file from disk
pdf_open_with_password(path, password)  # open an encrypted PDF with a password
pdf_open_from_bytes(bytes)              # open a PDF from an in-memory raw vector
pdf_close(x)                            # close any pdfoxide handle and free it

Opening from Office formats

Convert and open a Word/PowerPoint/Excel document directly as a pdfoxide_document.

pdf_open_from_docx_bytes(bytes)   # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes)   # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes)   # convert XLSX bytes and open as a document

Document inspection

pdf_page_count(doc)            # number of pages
pdf_version(doc)               # PDF version as a list(major, minor)
pdf_is_encrypted(doc)          # TRUE if the document is encrypted
pdf_has_structure_tree(doc)    # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password)  # authenticate an encrypted document after opening
pdf_has_xfa(doc)               # TRUE if the document contains XFA forms
pdf_has_timestamp(doc)         # TRUE if the document carries a document timestamp

Text & content extraction

Single-page extraction (page index is 0-based).

pdf_extract_text(doc, page)              # reading-order plain text for one page
pdf_to_plain_text(doc, page)             # layout-aware plain text for one page
pdf_to_markdown(doc, page)               # Markdown for one page
pdf_to_html(doc, page)                   # HTML for one page
pdf_extract_structured_json(doc, page)   # structured layout JSON for one page

Whole-document extraction.

pdf_to_markdown_all(doc)      # Markdown for the entire document
pdf_to_html_all(doc)          # HTML for the entire document
pdf_to_plain_text_all(doc)    # plain text for the entire document
pdf_extract_all_text(doc)     # concatenated reading-order text for all pages

Structured / per-element extraction. These return data frames or lists of records.

pdf_extract_chars(doc, page)        # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page)        # word records with bounding boxes
pdf_extract_text_lines(doc, page)   # text-line records with bounding boxes
pdf_extract_tables(doc, page)       # detected tables with rows and cells
pdf_extract_paths(doc, page)        # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page)       # embedded font records used on a page
pdf_embedded_images(doc, page)      # embedded image records on a page
pdf_page_annotations(doc, page)     # annotation records on a page

Auto-detecting extraction (chooses native vs. OCR-style heuristics).

pdf_extract_text_auto(doc, page)                  # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL)  # best-effort structured page extraction

Region (clip-rectangle) extraction

Restrict extraction to a rectangle in PDF points (origin lower-left).

pdf_extract_text_in_rect(doc, page, x, y, width, height)    # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height)   # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height)   # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height)  # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height)  # images inside a rectangle

Lazy page handles

pdf_page() returns a lightweight pdfoxide_page bound to a single page; the text getters extract on demand.

pdf_page(doc, index)        # lazy handle for one page
pdf_page_text(page)         # plain text of the page
pdf_page_markdown(page)     # Markdown of the page
pdf_page_html(page)         # HTML of the page
pdf_page_plain_text(page)   # layout-aware plain text of the page

Page geometry & raw elements

pdf_page_get_width(doc, page)      # page width in PDF points
pdf_page_get_height(doc, page)     # page height in PDF points
pdf_page_get_rotation(doc, page)   # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page)   # raw element records for the page

Search

pdf_search(doc, page, term, case_sensitive = FALSE)        # search one page
pdf_search_all(doc, term, case_sensitive = FALSE)          # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE)  # page search results as JSON

Page classification & cleanup

Detect and strip running headers, footers, and artifacts.

pdf_classify_page(doc, page)              # classify the layout/content of one page
pdf_classify_document(doc)                # classify the whole document
pdf_remove_headers(doc, threshold = 0.5)  # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5)  # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5)  # detect and remove page artifacts
pdf_erase_header(doc, page)               # erase the header region on a page
pdf_erase_footer(doc, page)               # erase the footer region on a page
pdf_erase_artifacts(doc, page)            # erase artifact regions on a page

Office conversion (export)

Convert a loaded PDF back out to an Office format. Returns a raw vector.

pdf_to_docx(doc)   # convert the document to DOCX bytes
pdf_to_pptx(doc)   # convert the document to PPTX bytes
pdf_to_xlsx(doc)   # convert the document to XLSX bytes

Forms

pdf_get_form_fields(doc)                          # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L)  # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path)              # import form data from a file path
pdf_form_import_from_file(doc, filename)          # import form data from a named file

Editor-side form helpers are listed under Editing PDFs.

Document structure & metadata

pdf_get_outline(doc)        # document outline / bookmarks tree
pdf_get_page_labels(doc)    # page-label ranges
pdf_get_xmp_metadata(doc)   # XMP metadata as a list
pdf_get_source_bytes(doc)   # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL)  # plan a split of the document by top-level bookmarks

Annotation details

Inspect individual annotations by page and index.

pdf_annotation_get_color(doc, page, index)              # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index)      # creation date string
pdf_annotation_get_modification_date(doc, page, index)  # modification date string
pdf_annotation_is_hidden(doc, page, index)              # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index)      # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index)           # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index)           # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index)           # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index)     # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index)        # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index)   # one highlight quad point
pdf_annotations_to_json(doc, page)                      # all annotations on a page as JSON

Font & element JSON helpers

pdf_font_get_size(doc, page, index)   # size of a font record on a page
pdf_fonts_to_json(doc, page)          # page fonts as JSON
pdf_elements_to_json(doc, page)       # page elements as JSON

Rendering

Render pages to raster images. format: 0 = PNG, 1 = JPEG. Coordinates and DPI are documented per function.

pdf_render_page(doc, page, format = 0L)                 # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L)      # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L)       # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L)              # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L)  # render a sub-region

Full RenderOptions surface (background RGBA, transparency, annotation toggling, JPEG quality, layer exclusion).

pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
                             bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                             transparent_background = FALSE,
                             render_annotations = TRUE, jpeg_quality = 85L)

pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
                                bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                                transparent_background = FALSE,
                                render_annotations = TRUE, jpeg_quality = 85L,
                                excluded_layers = NULL)

Reusable renderer and timing estimate.

pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE)  # build a reusable renderer
pdf_renderer_close(renderer)            # free a renderer
pdf_estimate_render_time(doc, page)     # estimate render time for a page

Rendered-image handle helpers.

pdf_rendered_image_save(image, path)    # write a rendered image to a file
pdf_rendered_image_close(image)         # free a rendered image

Editing PDFs

Open a PDF for mutation. Returns a pdfoxide_editor.

pdf_editor_open(path)               # open a PDF for editing
pdf_editor_open_from_bytes(bytes)   # open an editor from a raw vector
pdf_editor_close(editor)            # close the editor and free it

Editor inspection & metadata

pdf_editor_page_count(editor)               # page count
pdf_editor_version(editor)                  # PDF version as list(major, minor)
pdf_editor_is_modified(editor)              # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor)              # original source path, if any
pdf_editor_get_producer(editor)             # Producer metadata string
pdf_editor_set_producer(editor, value)      # set the Producer metadata
pdf_editor_get_creation_date(editor)        # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate

Page operations

pdf_editor_delete_page(editor, page)               # delete a page
pdf_editor_move_page(editor, from, to)             # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees)   # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees)       # rotate every page
pdf_editor_get_page_rotation(editor, page)         # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees)  # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom)  # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page)         # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h)  # set CropBox
pdf_editor_get_page_media_box(editor, page)        # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox

Redaction (editor)

pdf_editor_apply_all_redactions(editor)                  # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page)           # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page)    # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page)       # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h)        # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects)            # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page)             # clear pending erase regions

Standalone redaction workflow

pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0)  # add a redaction box with a fill color
pdf_redaction_count(editor, page)                                    # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0)  # burn in all redactions
pdf_redaction_scrub_metadata(editor)                                 # scrub metadata for redaction hygiene

Forms & annotations (editor)

pdf_editor_flatten_forms(editor)                       # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page)         # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value)   # set a form-field value by name
pdf_editor_flatten_annotations(editor, page)           # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor)             # flatten all annotations
pdf_editor_flatten_warnings_count(editor)              # number of flatten warnings
pdf_editor_flatten_warning(editor, index)              # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page)    # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page)       # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes)             # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes)            # import XFDF form data

Document operations (editor)

pdf_editor_merge_from(editor, source_path)             # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes)             # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level)             # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes)             # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages)       # extract selected pages to a new PDF (bytes)

Saving (editor)

pdf_editor_save(editor, path)                          # save to a file
pdf_editor_save_to_bytes(editor)                       # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
                                      garbage_collect = TRUE, linearize = FALSE)  # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password)            # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password)         # save AES-encrypted to bytes

DocumentBuilder (programmatic creation)

Construct a PDF page-by-page. pdf_builder_create() returns a pdfoxide_builder; page constructors return a fluent pdfoxide_page.

pdf_builder_create()         # start a new DocumentBuilder
pdf_builder_close(builder)   # free a builder

Builder document metadata

pdf_builder_set_title(builder, value)      # set document title
pdf_builder_set_author(builder, value)     # set document author
pdf_builder_set_subject(builder, value)    # set document subject
pdf_builder_set_keywords(builder, value)   # set document keywords
pdf_builder_set_creator(builder, value)    # set document creator
pdf_builder_on_open(builder, script)       # set a document-open JavaScript action
pdf_builder_language(builder, lang)        # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder)        # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard)        # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font)  # register an embedded font for use on pages

Builder pages & output

pdf_builder_page(builder, width, height)   # start a custom-size page
pdf_builder_a4_page(builder)               # start an A4 page
pdf_builder_letter_page(builder)           # start a US Letter page
pdf_builder_build(builder)                 # finish and return the PDF as bytes
pdf_builder_save(builder, path)            # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password)     # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password)       # finish and return encrypted bytes

Embedded fonts

pdf_embedded_font_from_file(path)                 # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL)  # load an embedded font from bytes
pdf_embedded_font_close(font)                     # free an embedded font handle

Page builder (fluent layout)

All of the following operate on a pdfoxide_page returned by pdf_builder_page() and return the page invisibly for chaining. Finish a page with pdf_page_done().

Text flow & typography

pdf_page_font(page, name, size)        # set the active font and size
pdf_page_at(page, x, y)                # move the text cursor to a coordinate
pdf_page_builder_text(page, text)      # draw text at the cursor
pdf_page_heading(page, level, text)    # add a heading (level 1-6)
pdf_page_paragraph(page, text)         # add a wrapped paragraph
pdf_page_space(page, points)           # add vertical space
pdf_page_horizontal_rule(page)         # draw a horizontal rule
pdf_page_newline(page)                 # advance to the next line
pdf_page_footnote(page, ref_mark, note_text)        # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text)  # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L)  # flow text inside a rectangle
pdf_page_new_page_same_size(page)      # start a new page of the same size
pdf_page_done(page)                    # finish the page and return to the builder
pdf_page_close(page)                   # free a page handle

Inline styled runs

pdf_page_inline(page, text)               # append an inline text run
pdf_page_inline_bold(page, text)          # append a bold inline run
pdf_page_inline_italic(page, text)        # append an italic inline run
pdf_page_inline_color(page, r, g, b, text)  # append a colored inline run

Links & JavaScript actions

pdf_page_link_url(page, url)              # add a URL link
pdf_page_link_page(page, index)           # add an internal page link
pdf_page_link_named(page, destination)    # add a named-destination link
pdf_page_link_javascript(page, script)    # add a JavaScript-action link
pdf_page_on_open(page, script)            # page-open JavaScript action
pdf_page_on_close(page, script)           # page-close JavaScript action
pdf_page_field_keystroke(page, script)    # field keystroke JavaScript action
pdf_page_field_format(page, script)       # field format JavaScript action
pdf_page_field_validate(page, script)     # field validate JavaScript action
pdf_page_field_calculate(page, script)    # field calculate JavaScript action

Annotations & markup

pdf_page_highlight(page, r, g, b)         # highlight markup at the current run
pdf_page_underline(page, r, g, b)         # underline markup
pdf_page_strikeout(page, r, g, b)         # strikeout markup
pdf_page_squiggly(page, r, g, b)          # squiggly underline markup
pdf_page_sticky_note(page, text)          # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text)            # add a text watermark
pdf_page_watermark_confidential(page)     # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page)            # add a DRAFT watermark
pdf_page_stamp(page, type_name)           # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation

AcroForm widgets

pdf_page_text_field(page, name, x, y, w, h, default_value = NULL)        # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE)               # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL)     # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL)  # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption)                    # push button
pdf_page_signature_field(page, name, x, y, w, h)                         # signature field

Barcodes (page builder)

pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h)  # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size)                # draw a QR code

Images

pdf_page_image(page, bytes, x, y, w, h)                  # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text)  # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h)         # place an image tagged as an artifact

Vector graphics

pdf_page_rect(page, x, y, w, h)                          # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b)          # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2)                      # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b)   # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b)  # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0)    # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0)  # dashed line

Tables

pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
               n_columns = length(widths), n_rows = NULL)  # render a static table

Streaming tables for large/incremental data.

pdf_page_streaming_table_begin(page, headers, widths, aligns,
                               repeat_header = FALSE, n_columns = length(headers))  # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
                                  repeat_header = FALSE, mode = 0L, sample_rows = 0L,
                                  min_col_width_pt = 0, max_col_width_pt = 0,
                                  max_rowspan = 0L, n_columns = length(headers))  # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size)      # set the flush batch size
pdf_page_streaming_table_pending_row_count(page)               # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page)                     # number of flushed batches
pdf_page_streaming_table_flush(page)                           # flush buffered rows
pdf_page_streaming_table_push_row(page, cells)                 # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL)  # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page)                          # finish and lay out the streaming table

Digital signatures

Certificates

pdf_certificate_load_from_bytes(bytes, password = NULL)  # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem)         # load a certificate + key from PEM
pdf_certificate_subject(cert)    # certificate subject DN
pdf_certificate_issuer(cert)     # certificate issuer DN
pdf_certificate_serial(cert)     # certificate serial number
pdf_certificate_validity(cert)   # validity window
pdf_certificate_is_valid(cert)   # TRUE if currently within the validity window
pdf_certificate_close(cert)      # free a certificate handle

Signing

pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL)            # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...)     # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...)  # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL)          # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url)                     # add a TSA timestamp to a signature in bytes

Signature inspection & verification

pdf_signature_count(doc)                  # number of signatures
pdf_get_signature(doc, index)             # signature handle by index
pdf_signature_signer_name(sig)            # signer common name
pdf_signature_signing_reason(sig)         # signing reason
pdf_signature_signing_location(sig)       # signing location
pdf_signature_signing_time(sig)           # signing time
pdf_signature_certificate(sig)            # signer certificate handle
pdf_signature_pades_level(sig)            # PAdES level of the signature
pdf_signature_has_timestamp(sig)          # TRUE if the signature is timestamped
pdf_signature_timestamp(sig)              # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp)  # attach a timestamp to a signature
pdf_signature_verify(sig)                 # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf)   # verify with a detached message digest check
pdf_signature_close(sig)                  # free a signature handle
pdf_verify_all_signatures(doc)            # verify every signature in the document

Timestamps

pdf_timestamp_parse(bytes)               # parse a timestamp token (TST)
pdf_timestamp_token(timestamp)           # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp)            # timestamp time
pdf_timestamp_serial(timestamp)          # timestamp serial number
pdf_timestamp_tsa_name(timestamp)        # TSA name
pdf_timestamp_policy_oid(timestamp)      # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp)  # hash algorithm used
pdf_timestamp_verify(timestamp)          # verify the timestamp token
pdf_timestamp_close(timestamp)           # free a timestamp handle

TSA client

pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
                      hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE)  # create a TSA client
pdf_tsa_request_timestamp(client, data)              # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L)  # request a timestamp over a precomputed hash
pdf_tsa_client_close(client)                         # free a TSA client

Document Security Store (DSS)

pdf_get_dss(doc)              # get the document's DSS handle
pdf_dss_cert_count(dss)       # number of certificates in the DSS
pdf_dss_crl_count(dss)        # number of CRLs
pdf_dss_ocsp_count(dss)       # number of OCSP responses
pdf_dss_vri_count(dss)        # number of VRI entries
pdf_dss_get_cert(dss, index)  # one DSS certificate
pdf_dss_get_crl(dss, index)   # one DSS CRL
pdf_dss_get_ocsp(dss, index)  # one DSS OCSP response
pdf_dss_close(dss)            # free a DSS handle

Compliance validation

PDF/A

pdf_validate_pdf_a(doc, level = 0L)   # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results)           # TRUE if compliant
pdf_a_errors(results)                 # list of validation errors
pdf_a_warning_count(results)          # number of warnings
pdf_a_results_close(results)          # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes

PDF/UA (accessibility)

pdf_validate_pdf_ua(doc, level = 0L)  # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results)         # TRUE if accessible
pdf_ua_errors(results)                # list of accessibility errors
pdf_ua_warnings(results)              # list of accessibility warnings
pdf_ua_stats(results)                 # accessibility statistics
pdf_ua_results_close(results)         # free the results handle

PDF/X (print)

pdf_validate_pdf_x(doc, level = 0L)   # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results)           # TRUE if compliant
pdf_x_errors(results)                 # list of validation errors
pdf_x_results_close(results)          # free the results handle

Barcodes

Standalone barcode generation and decoding.

pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L)  # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L)            # generate a barcode in a given format
pdf_barcode_get_data(barcode)             # decoded data string
pdf_barcode_get_format(barcode)           # barcode format
pdf_barcode_get_confidence(barcode)       # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L)  # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L)        # rendered SVG string
pdf_barcode_close(barcode)                # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...)  # stamp a barcode onto an editor page

OCR

Requires the ocr feature in the underlying build.

pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path)  # build an OCR engine from model paths
pdf_ocr_engine_close(engine)             # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page)        # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL)  # OCR a page (uses the default engine when NULL)

OCR models & runtime configuration

pdf_model_manifest()                       # available OCR model manifest
pdf_prefetch_available()                   # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL)  # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit)          # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping

Cryptographic provider / FIPS

pdf_crypto_active_provider()   # name of the active crypto provider
pdf_crypto_fips_available()    # TRUE if a FIPS provider is available
pdf_crypto_use_fips()          # switch to the FIPS provider
pdf_crypto_set_policy(spec)    # set the crypto policy from a spec string
pdf_crypto_policy()            # current crypto policy
pdf_crypto_inventory()         # cryptographic algorithm inventory
pdf_crypto_cbom()              # Cryptographic Bill of Materials (CBOM)

Logging

pdf_set_log_level(level)   # set the library log level
pdf_get_log_level()        # get the current log level

Complete example

library(pdfoxide)

# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)

# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")

for (i in seq_len(pdf_page_count(doc)) - 1L) {     # 0-based indices
  txt <- pdf_extract_text(doc, i)
  cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}

chars <- pdf_extract_chars(doc, 0)                 # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)

# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)

# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)

Other Language Bindings

PDF Oxide ships native bindings for every major ecosystem: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C, and Elixir.

Next Steps

Types & Enums — all shared types and enums
Page API Reference — consistent per-page iteration across bindings
Getting Started with R — tutorial