What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Referência da API R

O PDF Oxide oferece bindings idiomáticos para R como o pacote pdfoxide. O pacote encapsula a C ABI do pdf_oxide por meio da interface .Call nativa do R, então não há dependência de Java, Python ou de um runtime externo – apenas uma biblioteca compartilhada compilada.

# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide

library(pdfoxide)

Para a API Rust, veja a Referência da API Rust. Para Python, veja a Referência da API Python. Para JavaScript, veja a Referência da API JavaScript (Node.js) ou a Referência da API JavaScript (WASM).

A API R é uma API funcional plana construída em torno de objetos de handle opacos, em vez de classes R6/S4. Os principais tipos de handle são:

Handle	Criado por	Finalidade
`pdfoxide_pdf`	`pdf_from_markdown()`, `pdf_from_html()`, …	Um PDF recém-construído, pronto para salvar ou converter
`pdfoxide_document`	`pdf_open()`, `pdf_open_from_bytes()`	Um PDF carregado para extração e renderização somente leitura
`pdfoxide_editor`	`pdf_editor_open()`	Um PDF mutável para edição, mesclagem e salvamento
`pdfoxide_builder`	`pdf_builder_create()`	Um `DocumentBuilder` para construção programática de páginas
`pdfoxide_page` (builder)	`pdf_builder_page()`, `pdf_builder_a4_page()`, …	Uma página fluente em diagramação
`pdfoxide_page` (lazy)	`pdf_page()`	Um handle de leitura preguiçosa sobre uma única página do documento
`pdfoxide_renderer`, `pdfoxide_rendered_image`	funções de renderização	Renderizador reutilizável e saída raster renderizada
`pdfoxide_certificate`, `pdfoxide_signature`, `pdfoxide_timestamp`, `pdfoxide_tsa_client`, `pdfoxide_dss`	funções de assinatura/verificação	Primitivas de assinatura digital

Todos os índices de página são baseados em 0. Funções que alteram um builder/page/editor retornam o handle de forma invisível, para que as chamadas possam ser encadeadas com o pipe (|>). Os handles são fechados automaticamente na coleta de lixo, mas pdf_close() / *_close() os liberam de forma antecipada.

Criando PDFs

Criação rápida e direta a partir de um formato de origem. Cada função retorna um handle pdfoxide_pdf.

pdf_from_markdown(markdown)                              # build a PDF from a Markdown string
pdf_from_html(html)                                      # build a PDF from an HTML string
pdf_from_text(text)                                      # build a PDF from plain text
pdf_from_image(path)                                     # build a single-page PDF from an image file
pdf_from_image_bytes(bytes)                              # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL)          # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes)  # HTML + CSS with multiple named font families
pdf_merge(paths)                                         # merge several PDF files into one new PDF

Salvando / serializando um PDF criado

pdf_save(pdf, path)            # write the PDF to a file path
pdf_to_bytes(pdf)             # serialize the PDF to a raw vector
pdf_get_page_count(pdf)       # number of pages in a built pdfoxide_pdf

Abrindo documentos

Abre um PDF existente para extração e renderização. Retorna um pdfoxide_document.

pdf_open(path)                          # open a PDF file from disk
pdf_open_with_password(path, password)  # open an encrypted PDF with a password
pdf_open_from_bytes(bytes)              # open a PDF from an in-memory raw vector
pdf_close(x)                            # close any pdfoxide handle and free it

Abrindo a partir de formatos Office

Converte e abre um documento Word/PowerPoint/Excel diretamente como um pdfoxide_document.

pdf_open_from_docx_bytes(bytes)   # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes)   # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes)   # convert XLSX bytes and open as a document

Inspeção do documento

pdf_page_count(doc)            # number of pages
pdf_version(doc)               # PDF version as a list(major, minor)
pdf_is_encrypted(doc)          # TRUE if the document is encrypted
pdf_has_structure_tree(doc)    # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password)  # authenticate an encrypted document after opening
pdf_has_xfa(doc)               # TRUE if the document contains XFA forms
pdf_has_timestamp(doc)         # TRUE if the document carries a document timestamp

Extração de texto e conteúdo

Extração de uma única página (o índice de página é baseado em 0).

pdf_extract_text(doc, page)              # reading-order plain text for one page
pdf_to_plain_text(doc, page)             # layout-aware plain text for one page
pdf_to_markdown(doc, page)               # Markdown for one page
pdf_to_html(doc, page)                   # HTML for one page
pdf_extract_structured_json(doc, page)   # structured layout JSON for one page

Extração do documento inteiro.

pdf_to_markdown_all(doc)      # Markdown for the entire document
pdf_to_html_all(doc)          # HTML for the entire document
pdf_to_plain_text_all(doc)    # plain text for the entire document
pdf_extract_all_text(doc)     # concatenated reading-order text for all pages

Extração estruturada / por elemento. Essas funções retornam data frames ou listas de registros.

pdf_extract_chars(doc, page)        # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page)        # word records with bounding boxes
pdf_extract_text_lines(doc, page)   # text-line records with bounding boxes
pdf_extract_tables(doc, page)       # detected tables with rows and cells
pdf_extract_paths(doc, page)        # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page)       # embedded font records used on a page
pdf_embedded_images(doc, page)      # embedded image records on a page
pdf_page_annotations(doc, page)     # annotation records on a page

Extração com detecção automática (escolhe entre heurísticas nativas e estilo OCR).

pdf_extract_text_auto(doc, page)                  # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL)  # best-effort structured page extraction

Extração por região (retângulo de recorte)

Restringe a extração a um retângulo em pontos PDF (origem no canto inferior esquerdo).

pdf_extract_text_in_rect(doc, page, x, y, width, height)    # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height)   # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height)   # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height)  # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height)  # images inside a rectangle

Handles de página preguiçosos (lazy)

pdf_page() retorna um pdfoxide_page leve vinculado a uma única página; os getters de texto extraem sob demanda.

pdf_page(doc, index)        # lazy handle for one page
pdf_page_text(page)         # plain text of the page
pdf_page_markdown(page)     # Markdown of the page
pdf_page_html(page)         # HTML of the page
pdf_page_plain_text(page)   # layout-aware plain text of the page

Geometria da página e elementos brutos

pdf_page_get_width(doc, page)      # page width in PDF points
pdf_page_get_height(doc, page)     # page height in PDF points
pdf_page_get_rotation(doc, page)   # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page)   # raw element records for the page

Busca

pdf_search(doc, page, term, case_sensitive = FALSE)        # search one page
pdf_search_all(doc, term, case_sensitive = FALSE)          # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE)  # page search results as JSON

Classificação e limpeza de páginas

Detecta e remove cabeçalhos, rodapés e artefatos repetidos.

pdf_classify_page(doc, page)              # classify the layout/content of one page
pdf_classify_document(doc)                # classify the whole document
pdf_remove_headers(doc, threshold = 0.5)  # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5)  # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5)  # detect and remove page artifacts
pdf_erase_header(doc, page)               # erase the header region on a page
pdf_erase_footer(doc, page)               # erase the footer region on a page
pdf_erase_artifacts(doc, page)            # erase artifact regions on a page

Conversão Office (exportação)

Converte um PDF carregado de volta para um formato Office. Retorna um vetor raw.

pdf_to_docx(doc)   # convert the document to DOCX bytes
pdf_to_pptx(doc)   # convert the document to PPTX bytes
pdf_to_xlsx(doc)   # convert the document to XLSX bytes

Formulários

pdf_get_form_fields(doc)                          # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L)  # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path)              # import form data from a file path
pdf_form_import_from_file(doc, filename)          # import form data from a named file

Os helpers de formulário do lado do editor estão listados em Editando PDFs.

Estrutura do documento e metadados

pdf_get_outline(doc)        # document outline / bookmarks tree
pdf_get_page_labels(doc)    # page-label ranges
pdf_get_xmp_metadata(doc)   # XMP metadata as a list
pdf_get_source_bytes(doc)   # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL)  # plan a split of the document by top-level bookmarks

Detalhes de anotações

Inspeciona anotações individuais por página e índice.

pdf_annotation_get_color(doc, page, index)              # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index)      # creation date string
pdf_annotation_get_modification_date(doc, page, index)  # modification date string
pdf_annotation_is_hidden(doc, page, index)              # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index)      # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index)           # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index)           # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index)           # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index)     # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index)        # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index)   # one highlight quad point
pdf_annotations_to_json(doc, page)                      # all annotations on a page as JSON

Helpers de JSON para fontes e elementos

pdf_font_get_size(doc, page, index)   # size of a font record on a page
pdf_fonts_to_json(doc, page)          # page fonts as JSON
pdf_elements_to_json(doc, page)       # page elements as JSON

Renderização

Renderiza páginas em imagens raster. format: 0 = PNG, 1 = JPEG. Coordenadas e DPI são documentados por função.

pdf_render_page(doc, page, format = 0L)                 # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L)      # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L)       # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L)              # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L)  # render a sub-region

Superfície completa de RenderOptions (RGBA de fundo, transparência, alternância de anotações, qualidade JPEG, exclusão de camadas).

pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
                             bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                             transparent_background = FALSE,
                             render_annotations = TRUE, jpeg_quality = 85L)

pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
                                bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                                transparent_background = FALSE,
                                render_annotations = TRUE, jpeg_quality = 85L,
                                excluded_layers = NULL)

Renderizador reutilizável e estimativa de tempo.

pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE)  # build a reusable renderer
pdf_renderer_close(renderer)            # free a renderer
pdf_estimate_render_time(doc, page)     # estimate render time for a page

Helpers do handle de imagem renderizada.

pdf_rendered_image_save(image, path)    # write a rendered image to a file
pdf_rendered_image_close(image)         # free a rendered image

Editando PDFs

Abre um PDF para alteração. Retorna um pdfoxide_editor.

pdf_editor_open(path)               # open a PDF for editing
pdf_editor_open_from_bytes(bytes)   # open an editor from a raw vector
pdf_editor_close(editor)            # close the editor and free it

Inspeção e metadados do editor

pdf_editor_page_count(editor)               # page count
pdf_editor_version(editor)                  # PDF version as list(major, minor)
pdf_editor_is_modified(editor)              # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor)              # original source path, if any
pdf_editor_get_producer(editor)             # Producer metadata string
pdf_editor_set_producer(editor, value)      # set the Producer metadata
pdf_editor_get_creation_date(editor)        # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate

Operações de página

pdf_editor_delete_page(editor, page)               # delete a page
pdf_editor_move_page(editor, from, to)             # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees)   # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees)       # rotate every page
pdf_editor_get_page_rotation(editor, page)         # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees)  # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom)  # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page)         # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h)  # set CropBox
pdf_editor_get_page_media_box(editor, page)        # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox

Redação (editor)

pdf_editor_apply_all_redactions(editor)                  # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page)           # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page)    # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page)       # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h)        # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects)            # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page)             # clear pending erase regions

Fluxo de redação independente

pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0)  # add a redaction box with a fill color
pdf_redaction_count(editor, page)                                    # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0)  # burn in all redactions
pdf_redaction_scrub_metadata(editor)                                 # scrub metadata for redaction hygiene

Formulários e anotações (editor)

pdf_editor_flatten_forms(editor)                       # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page)         # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value)   # set a form-field value by name
pdf_editor_flatten_annotations(editor, page)           # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor)             # flatten all annotations
pdf_editor_flatten_warnings_count(editor)              # number of flatten warnings
pdf_editor_flatten_warning(editor, index)              # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page)    # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page)       # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes)             # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes)            # import XFDF form data

Operações de documento (editor)

pdf_editor_merge_from(editor, source_path)             # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes)             # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level)             # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes)             # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages)       # extract selected pages to a new PDF (bytes)

Salvando (editor)

pdf_editor_save(editor, path)                          # save to a file
pdf_editor_save_to_bytes(editor)                       # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
                                      garbage_collect = TRUE, linearize = FALSE)  # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password)            # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password)         # save AES-encrypted to bytes

DocumentBuilder (criação programática)

Constrói um PDF página por página. pdf_builder_create() retorna um pdfoxide_builder; os construtores de página retornam um pdfoxide_page fluente.

pdf_builder_create()         # start a new DocumentBuilder
pdf_builder_close(builder)   # free a builder

Metadados do documento do builder

pdf_builder_set_title(builder, value)      # set document title
pdf_builder_set_author(builder, value)     # set document author
pdf_builder_set_subject(builder, value)    # set document subject
pdf_builder_set_keywords(builder, value)   # set document keywords
pdf_builder_set_creator(builder, value)    # set document creator
pdf_builder_on_open(builder, script)       # set a document-open JavaScript action
pdf_builder_language(builder, lang)        # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder)        # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard)        # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font)  # register an embedded font for use on pages

Páginas e saída do builder

pdf_builder_page(builder, width, height)   # start a custom-size page
pdf_builder_a4_page(builder)               # start an A4 page
pdf_builder_letter_page(builder)           # start a US Letter page
pdf_builder_build(builder)                 # finish and return the PDF as bytes
pdf_builder_save(builder, path)            # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password)     # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password)       # finish and return encrypted bytes

Fontes embutidas

pdf_embedded_font_from_file(path)                 # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL)  # load an embedded font from bytes
pdf_embedded_font_close(font)                     # free an embedded font handle

Page builder (diagramação fluente)

Todas as funções a seguir operam sobre um pdfoxide_page retornado por pdf_builder_page() e retornam a página de forma invisível para encadeamento. Finalize uma página com pdf_page_done().

Fluxo de texto e tipografia

pdf_page_font(page, name, size)        # set the active font and size
pdf_page_at(page, x, y)                # move the text cursor to a coordinate
pdf_page_builder_text(page, text)      # draw text at the cursor
pdf_page_heading(page, level, text)    # add a heading (level 1-6)
pdf_page_paragraph(page, text)         # add a wrapped paragraph
pdf_page_space(page, points)           # add vertical space
pdf_page_horizontal_rule(page)         # draw a horizontal rule
pdf_page_newline(page)                 # advance to the next line
pdf_page_footnote(page, ref_mark, note_text)        # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text)  # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L)  # flow text inside a rectangle
pdf_page_new_page_same_size(page)      # start a new page of the same size
pdf_page_done(page)                    # finish the page and return to the builder
pdf_page_close(page)                   # free a page handle

Trechos com estilo embutido (inline)

pdf_page_inline(page, text)               # append an inline text run
pdf_page_inline_bold(page, text)          # append a bold inline run
pdf_page_inline_italic(page, text)        # append an italic inline run
pdf_page_inline_color(page, r, g, b, text)  # append a colored inline run

Links e ações JavaScript

pdf_page_link_url(page, url)              # add a URL link
pdf_page_link_page(page, index)           # add an internal page link
pdf_page_link_named(page, destination)    # add a named-destination link
pdf_page_link_javascript(page, script)    # add a JavaScript-action link
pdf_page_on_open(page, script)            # page-open JavaScript action
pdf_page_on_close(page, script)           # page-close JavaScript action
pdf_page_field_keystroke(page, script)    # field keystroke JavaScript action
pdf_page_field_format(page, script)       # field format JavaScript action
pdf_page_field_validate(page, script)     # field validate JavaScript action
pdf_page_field_calculate(page, script)    # field calculate JavaScript action

Anotações e marcações

pdf_page_highlight(page, r, g, b)         # highlight markup at the current run
pdf_page_underline(page, r, g, b)         # underline markup
pdf_page_strikeout(page, r, g, b)         # strikeout markup
pdf_page_squiggly(page, r, g, b)          # squiggly underline markup
pdf_page_sticky_note(page, text)          # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text)            # add a text watermark
pdf_page_watermark_confidential(page)     # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page)            # add a DRAFT watermark
pdf_page_stamp(page, type_name)           # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation

Widgets de AcroForm

pdf_page_text_field(page, name, x, y, w, h, default_value = NULL)        # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE)               # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL)     # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL)  # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption)                    # push button
pdf_page_signature_field(page, name, x, y, w, h)                         # signature field

Códigos de barras (page builder)

pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h)  # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size)                # draw a QR code

Imagens

pdf_page_image(page, bytes, x, y, w, h)                  # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text)  # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h)         # place an image tagged as an artifact

Gráficos vetoriais

pdf_page_rect(page, x, y, w, h)                          # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b)          # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2)                      # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b)   # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b)  # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0)    # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0)  # dashed line

Tabelas

pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
               n_columns = length(widths), n_rows = NULL)  # render a static table

Tabelas em streaming para dados grandes/incrementais.

pdf_page_streaming_table_begin(page, headers, widths, aligns,
                               repeat_header = FALSE, n_columns = length(headers))  # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
                                  repeat_header = FALSE, mode = 0L, sample_rows = 0L,
                                  min_col_width_pt = 0, max_col_width_pt = 0,
                                  max_rowspan = 0L, n_columns = length(headers))  # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size)      # set the flush batch size
pdf_page_streaming_table_pending_row_count(page)               # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page)                     # number of flushed batches
pdf_page_streaming_table_flush(page)                           # flush buffered rows
pdf_page_streaming_table_push_row(page, cells)                 # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL)  # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page)                          # finish and lay out the streaming table

Assinaturas digitais

Certificados

pdf_certificate_load_from_bytes(bytes, password = NULL)  # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem)         # load a certificate + key from PEM
pdf_certificate_subject(cert)    # certificate subject DN
pdf_certificate_issuer(cert)     # certificate issuer DN
pdf_certificate_serial(cert)     # certificate serial number
pdf_certificate_validity(cert)   # validity window
pdf_certificate_is_valid(cert)   # TRUE if currently within the validity window
pdf_certificate_close(cert)      # free a certificate handle

Assinatura

pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL)            # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...)     # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...)  # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL)          # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url)                     # add a TSA timestamp to a signature in bytes

Inspeção e verificação de assinatura

pdf_signature_count(doc)                  # number of signatures
pdf_get_signature(doc, index)             # signature handle by index
pdf_signature_signer_name(sig)            # signer common name
pdf_signature_signing_reason(sig)         # signing reason
pdf_signature_signing_location(sig)       # signing location
pdf_signature_signing_time(sig)           # signing time
pdf_signature_certificate(sig)            # signer certificate handle
pdf_signature_pades_level(sig)            # PAdES level of the signature
pdf_signature_has_timestamp(sig)          # TRUE if the signature is timestamped
pdf_signature_timestamp(sig)              # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp)  # attach a timestamp to a signature
pdf_signature_verify(sig)                 # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf)   # verify with a detached message digest check
pdf_signature_close(sig)                  # free a signature handle
pdf_verify_all_signatures(doc)            # verify every signature in the document

Carimbos de tempo

pdf_timestamp_parse(bytes)               # parse a timestamp token (TST)
pdf_timestamp_token(timestamp)           # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp)            # timestamp time
pdf_timestamp_serial(timestamp)          # timestamp serial number
pdf_timestamp_tsa_name(timestamp)        # TSA name
pdf_timestamp_policy_oid(timestamp)      # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp)  # hash algorithm used
pdf_timestamp_verify(timestamp)          # verify the timestamp token
pdf_timestamp_close(timestamp)           # free a timestamp handle

Cliente TSA

pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
                      hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE)  # create a TSA client
pdf_tsa_request_timestamp(client, data)              # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L)  # request a timestamp over a precomputed hash
pdf_tsa_client_close(client)                         # free a TSA client

Document Security Store (DSS)

pdf_get_dss(doc)              # get the document's DSS handle
pdf_dss_cert_count(dss)       # number of certificates in the DSS
pdf_dss_crl_count(dss)        # number of CRLs
pdf_dss_ocsp_count(dss)       # number of OCSP responses
pdf_dss_vri_count(dss)        # number of VRI entries
pdf_dss_get_cert(dss, index)  # one DSS certificate
pdf_dss_get_crl(dss, index)   # one DSS CRL
pdf_dss_get_ocsp(dss, index)  # one DSS OCSP response
pdf_dss_close(dss)            # free a DSS handle

Validação de conformidade

PDF/A

pdf_validate_pdf_a(doc, level = 0L)   # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results)           # TRUE if compliant
pdf_a_errors(results)                 # list of validation errors
pdf_a_warning_count(results)          # number of warnings
pdf_a_results_close(results)          # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes

PDF/UA (acessibilidade)

pdf_validate_pdf_ua(doc, level = 0L)  # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results)         # TRUE if accessible
pdf_ua_errors(results)                # list of accessibility errors
pdf_ua_warnings(results)              # list of accessibility warnings
pdf_ua_stats(results)                 # accessibility statistics
pdf_ua_results_close(results)         # free the results handle

PDF/X (impressão)

pdf_validate_pdf_x(doc, level = 0L)   # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results)           # TRUE if compliant
pdf_x_errors(results)                 # list of validation errors
pdf_x_results_close(results)          # free the results handle

Códigos de barras

Geração e decodificação independente de códigos de barras.

pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L)  # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L)            # generate a barcode in a given format
pdf_barcode_get_data(barcode)             # decoded data string
pdf_barcode_get_format(barcode)           # barcode format
pdf_barcode_get_confidence(barcode)       # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L)  # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L)        # rendered SVG string
pdf_barcode_close(barcode)                # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...)  # stamp a barcode onto an editor page

OCR

Requer a feature ocr na build subjacente.

pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path)  # build an OCR engine from model paths
pdf_ocr_engine_close(engine)             # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page)        # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL)  # OCR a page (uses the default engine when NULL)

Modelos de OCR e configuração de runtime

pdf_model_manifest()                       # available OCR model manifest
pdf_prefetch_available()                   # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL)  # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit)          # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping

Provedor criptográfico / FIPS

pdf_crypto_active_provider()   # name of the active crypto provider
pdf_crypto_fips_available()    # TRUE if a FIPS provider is available
pdf_crypto_use_fips()          # switch to the FIPS provider
pdf_crypto_set_policy(spec)    # set the crypto policy from a spec string
pdf_crypto_policy()            # current crypto policy
pdf_crypto_inventory()         # cryptographic algorithm inventory
pdf_crypto_cbom()              # Cryptographic Bill of Materials (CBOM)

Logging

pdf_set_log_level(level)   # set the library log level
pdf_get_log_level()        # get the current log level

Exemplo completo

library(pdfoxide)

# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)

# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")

for (i in seq_len(pdf_page_count(doc)) - 1L) {     # 0-based indices
  txt <- pdf_extract_text(doc, i)
  cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}

chars <- pdf_extract_chars(doc, 0)                 # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)

# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)

# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)

Other Language Bindings

O PDF Oxide oferece bindings nativos para todos os principais ecossistemas: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C e Elixir.

Próximos passos

Tipos e Enums — todos os tipos e enums compartilhados
Referência da API Page — iteração consistente por página entre os bindings
Primeiros Passos com R — tutorial