Referência da API R
O PDF Oxide oferece bindings idiomáticos para R como o pacote pdfoxide. O pacote encapsula a C ABI do pdf_oxide por meio da interface .Call nativa do R, então não há dependência de Java, Python ou de um runtime externo – apenas uma biblioteca compartilhada compilada.
# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide
library(pdfoxide)
Para a API Rust, veja a Referência da API Rust. Para Python, veja a Referência da API Python. Para JavaScript, veja a Referência da API JavaScript (Node.js) ou a Referência da API JavaScript (WASM).
A API R é uma API funcional plana construída em torno de objetos de handle opacos, em vez de classes R6/S4. Os principais tipos de handle são:
| Handle | Criado por | Finalidade |
|---|---|---|
pdfoxide_pdf |
pdf_from_markdown(), pdf_from_html(), … |
Um PDF recém-construído, pronto para salvar ou converter |
pdfoxide_document |
pdf_open(), pdf_open_from_bytes() |
Um PDF carregado para extração e renderização somente leitura |
pdfoxide_editor |
pdf_editor_open() |
Um PDF mutável para edição, mesclagem e salvamento |
pdfoxide_builder |
pdf_builder_create() |
Um DocumentBuilder para construção programática de páginas |
pdfoxide_page (builder) |
pdf_builder_page(), pdf_builder_a4_page(), … |
Uma página fluente em diagramação |
pdfoxide_page (lazy) |
pdf_page() |
Um handle de leitura preguiçosa sobre uma única página do documento |
pdfoxide_renderer, pdfoxide_rendered_image |
funções de renderização | Renderizador reutilizável e saída raster renderizada |
pdfoxide_certificate, pdfoxide_signature, pdfoxide_timestamp, pdfoxide_tsa_client, pdfoxide_dss |
funções de assinatura/verificação | Primitivas de assinatura digital |
Todos os índices de página são baseados em 0. Funções que alteram um builder/page/editor retornam o handle de forma invisível, para que as chamadas possam ser encadeadas com o pipe (|>). Os handles são fechados automaticamente na coleta de lixo, mas pdf_close() / *_close() os liberam de forma antecipada.
Criando PDFs
Criação rápida e direta a partir de um formato de origem. Cada função retorna um handle pdfoxide_pdf.
pdf_from_markdown(markdown) # build a PDF from a Markdown string
pdf_from_html(html) # build a PDF from an HTML string
pdf_from_text(text) # build a PDF from plain text
pdf_from_image(path) # build a single-page PDF from an image file
pdf_from_image_bytes(bytes) # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL) # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes) # HTML + CSS with multiple named font families
pdf_merge(paths) # merge several PDF files into one new PDF
Salvando / serializando um PDF criado
pdf_save(pdf, path) # write the PDF to a file path
pdf_to_bytes(pdf) # serialize the PDF to a raw vector
pdf_get_page_count(pdf) # number of pages in a built pdfoxide_pdf
Abrindo documentos
Abre um PDF existente para extração e renderização. Retorna um pdfoxide_document.
pdf_open(path) # open a PDF file from disk
pdf_open_with_password(path, password) # open an encrypted PDF with a password
pdf_open_from_bytes(bytes) # open a PDF from an in-memory raw vector
pdf_close(x) # close any pdfoxide handle and free it
Abrindo a partir de formatos Office
Converte e abre um documento Word/PowerPoint/Excel diretamente como um pdfoxide_document.
pdf_open_from_docx_bytes(bytes) # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes) # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes) # convert XLSX bytes and open as a document
Inspeção do documento
pdf_page_count(doc) # number of pages
pdf_version(doc) # PDF version as a list(major, minor)
pdf_is_encrypted(doc) # TRUE if the document is encrypted
pdf_has_structure_tree(doc) # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password) # authenticate an encrypted document after opening
pdf_has_xfa(doc) # TRUE if the document contains XFA forms
pdf_has_timestamp(doc) # TRUE if the document carries a document timestamp
Extração de texto e conteúdo
Extração de uma única página (o índice de página é baseado em 0).
pdf_extract_text(doc, page) # reading-order plain text for one page
pdf_to_plain_text(doc, page) # layout-aware plain text for one page
pdf_to_markdown(doc, page) # Markdown for one page
pdf_to_html(doc, page) # HTML for one page
pdf_extract_structured_json(doc, page) # structured layout JSON for one page
Extração do documento inteiro.
pdf_to_markdown_all(doc) # Markdown for the entire document
pdf_to_html_all(doc) # HTML for the entire document
pdf_to_plain_text_all(doc) # plain text for the entire document
pdf_extract_all_text(doc) # concatenated reading-order text for all pages
Extração estruturada / por elemento. Essas funções retornam data frames ou listas de registros.
pdf_extract_chars(doc, page) # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page) # word records with bounding boxes
pdf_extract_text_lines(doc, page) # text-line records with bounding boxes
pdf_extract_tables(doc, page) # detected tables with rows and cells
pdf_extract_paths(doc, page) # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page) # embedded font records used on a page
pdf_embedded_images(doc, page) # embedded image records on a page
pdf_page_annotations(doc, page) # annotation records on a page
Extração com detecção automática (escolhe entre heurísticas nativas e estilo OCR).
pdf_extract_text_auto(doc, page) # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL) # best-effort structured page extraction
Extração por região (retângulo de recorte)
Restringe a extração a um retângulo em pontos PDF (origem no canto inferior esquerdo).
pdf_extract_text_in_rect(doc, page, x, y, width, height) # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height) # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height) # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height) # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height) # images inside a rectangle
Handles de página preguiçosos (lazy)
pdf_page() retorna um pdfoxide_page leve vinculado a uma única página; os getters de texto extraem sob demanda.
pdf_page(doc, index) # lazy handle for one page
pdf_page_text(page) # plain text of the page
pdf_page_markdown(page) # Markdown of the page
pdf_page_html(page) # HTML of the page
pdf_page_plain_text(page) # layout-aware plain text of the page
Geometria da página e elementos brutos
pdf_page_get_width(doc, page) # page width in PDF points
pdf_page_get_height(doc, page) # page height in PDF points
pdf_page_get_rotation(doc, page) # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page) # raw element records for the page
Busca
pdf_search(doc, page, term, case_sensitive = FALSE) # search one page
pdf_search_all(doc, term, case_sensitive = FALSE) # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE) # page search results as JSON
Classificação e limpeza de páginas
Detecta e remove cabeçalhos, rodapés e artefatos repetidos.
pdf_classify_page(doc, page) # classify the layout/content of one page
pdf_classify_document(doc) # classify the whole document
pdf_remove_headers(doc, threshold = 0.5) # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5) # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5) # detect and remove page artifacts
pdf_erase_header(doc, page) # erase the header region on a page
pdf_erase_footer(doc, page) # erase the footer region on a page
pdf_erase_artifacts(doc, page) # erase artifact regions on a page
Conversão Office (exportação)
Converte um PDF carregado de volta para um formato Office. Retorna um vetor raw.
pdf_to_docx(doc) # convert the document to DOCX bytes
pdf_to_pptx(doc) # convert the document to PPTX bytes
pdf_to_xlsx(doc) # convert the document to XLSX bytes
Formulários
pdf_get_form_fields(doc) # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L) # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path) # import form data from a file path
pdf_form_import_from_file(doc, filename) # import form data from a named file
Os helpers de formulário do lado do editor estão listados em Editando PDFs.
Estrutura do documento e metadados
pdf_get_outline(doc) # document outline / bookmarks tree
pdf_get_page_labels(doc) # page-label ranges
pdf_get_xmp_metadata(doc) # XMP metadata as a list
pdf_get_source_bytes(doc) # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL) # plan a split of the document by top-level bookmarks
Detalhes de anotações
Inspeciona anotações individuais por página e índice.
pdf_annotation_get_color(doc, page, index) # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index) # creation date string
pdf_annotation_get_modification_date(doc, page, index) # modification date string
pdf_annotation_is_hidden(doc, page, index) # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index) # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index) # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index) # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index) # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index) # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index) # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index) # one highlight quad point
pdf_annotations_to_json(doc, page) # all annotations on a page as JSON
Helpers de JSON para fontes e elementos
pdf_font_get_size(doc, page, index) # size of a font record on a page
pdf_fonts_to_json(doc, page) # page fonts as JSON
pdf_elements_to_json(doc, page) # page elements as JSON
Renderização
Renderiza páginas em imagens raster. format: 0 = PNG, 1 = JPEG. Coordenadas e DPI são documentados por função.
pdf_render_page(doc, page, format = 0L) # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L) # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L) # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L) # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L) # render a sub-region
Superfície completa de RenderOptions (RGBA de fundo, transparência, alternância de anotações, qualidade JPEG, exclusão de camadas).
pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L)
pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L,
excluded_layers = NULL)
Renderizador reutilizável e estimativa de tempo.
pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE) # build a reusable renderer
pdf_renderer_close(renderer) # free a renderer
pdf_estimate_render_time(doc, page) # estimate render time for a page
Helpers do handle de imagem renderizada.
pdf_rendered_image_save(image, path) # write a rendered image to a file
pdf_rendered_image_close(image) # free a rendered image
Editando PDFs
Abre um PDF para alteração. Retorna um pdfoxide_editor.
pdf_editor_open(path) # open a PDF for editing
pdf_editor_open_from_bytes(bytes) # open an editor from a raw vector
pdf_editor_close(editor) # close the editor and free it
Inspeção e metadados do editor
pdf_editor_page_count(editor) # page count
pdf_editor_version(editor) # PDF version as list(major, minor)
pdf_editor_is_modified(editor) # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor) # original source path, if any
pdf_editor_get_producer(editor) # Producer metadata string
pdf_editor_set_producer(editor, value) # set the Producer metadata
pdf_editor_get_creation_date(editor) # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate
Operações de página
pdf_editor_delete_page(editor, page) # delete a page
pdf_editor_move_page(editor, from, to) # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees) # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees) # rotate every page
pdf_editor_get_page_rotation(editor, page) # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees) # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom) # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page) # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h) # set CropBox
pdf_editor_get_page_media_box(editor, page) # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox
Redação (editor)
pdf_editor_apply_all_redactions(editor) # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page) # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page) # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page) # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h) # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects) # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page) # clear pending erase regions
Fluxo de redação independente
pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0) # add a redaction box with a fill color
pdf_redaction_count(editor, page) # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0) # burn in all redactions
pdf_redaction_scrub_metadata(editor) # scrub metadata for redaction hygiene
Formulários e anotações (editor)
pdf_editor_flatten_forms(editor) # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page) # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value) # set a form-field value by name
pdf_editor_flatten_annotations(editor, page) # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor) # flatten all annotations
pdf_editor_flatten_warnings_count(editor) # number of flatten warnings
pdf_editor_flatten_warning(editor, index) # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page) # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page) # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes) # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes) # import XFDF form data
Operações de documento (editor)
pdf_editor_merge_from(editor, source_path) # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes) # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level) # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes) # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages) # extract selected pages to a new PDF (bytes)
Salvando (editor)
pdf_editor_save(editor, path) # save to a file
pdf_editor_save_to_bytes(editor) # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
garbage_collect = TRUE, linearize = FALSE) # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password) # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password) # save AES-encrypted to bytes
DocumentBuilder (criação programática)
Constrói um PDF página por página. pdf_builder_create() retorna um pdfoxide_builder; os construtores de página retornam um pdfoxide_page fluente.
pdf_builder_create() # start a new DocumentBuilder
pdf_builder_close(builder) # free a builder
Metadados do documento do builder
pdf_builder_set_title(builder, value) # set document title
pdf_builder_set_author(builder, value) # set document author
pdf_builder_set_subject(builder, value) # set document subject
pdf_builder_set_keywords(builder, value) # set document keywords
pdf_builder_set_creator(builder, value) # set document creator
pdf_builder_on_open(builder, script) # set a document-open JavaScript action
pdf_builder_language(builder, lang) # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder) # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard) # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font) # register an embedded font for use on pages
Páginas e saída do builder
pdf_builder_page(builder, width, height) # start a custom-size page
pdf_builder_a4_page(builder) # start an A4 page
pdf_builder_letter_page(builder) # start a US Letter page
pdf_builder_build(builder) # finish and return the PDF as bytes
pdf_builder_save(builder, path) # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password) # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password) # finish and return encrypted bytes
Fontes embutidas
pdf_embedded_font_from_file(path) # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL) # load an embedded font from bytes
pdf_embedded_font_close(font) # free an embedded font handle
Page builder (diagramação fluente)
Todas as funções a seguir operam sobre um pdfoxide_page retornado por pdf_builder_page() e retornam a página de forma invisível para encadeamento. Finalize uma página com pdf_page_done().
Fluxo de texto e tipografia
pdf_page_font(page, name, size) # set the active font and size
pdf_page_at(page, x, y) # move the text cursor to a coordinate
pdf_page_builder_text(page, text) # draw text at the cursor
pdf_page_heading(page, level, text) # add a heading (level 1-6)
pdf_page_paragraph(page, text) # add a wrapped paragraph
pdf_page_space(page, points) # add vertical space
pdf_page_horizontal_rule(page) # draw a horizontal rule
pdf_page_newline(page) # advance to the next line
pdf_page_footnote(page, ref_mark, note_text) # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text) # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L) # flow text inside a rectangle
pdf_page_new_page_same_size(page) # start a new page of the same size
pdf_page_done(page) # finish the page and return to the builder
pdf_page_close(page) # free a page handle
Trechos com estilo embutido (inline)
pdf_page_inline(page, text) # append an inline text run
pdf_page_inline_bold(page, text) # append a bold inline run
pdf_page_inline_italic(page, text) # append an italic inline run
pdf_page_inline_color(page, r, g, b, text) # append a colored inline run
Links e ações JavaScript
pdf_page_link_url(page, url) # add a URL link
pdf_page_link_page(page, index) # add an internal page link
pdf_page_link_named(page, destination) # add a named-destination link
pdf_page_link_javascript(page, script) # add a JavaScript-action link
pdf_page_on_open(page, script) # page-open JavaScript action
pdf_page_on_close(page, script) # page-close JavaScript action
pdf_page_field_keystroke(page, script) # field keystroke JavaScript action
pdf_page_field_format(page, script) # field format JavaScript action
pdf_page_field_validate(page, script) # field validate JavaScript action
pdf_page_field_calculate(page, script) # field calculate JavaScript action
Anotações e marcações
pdf_page_highlight(page, r, g, b) # highlight markup at the current run
pdf_page_underline(page, r, g, b) # underline markup
pdf_page_strikeout(page, r, g, b) # strikeout markup
pdf_page_squiggly(page, r, g, b) # squiggly underline markup
pdf_page_sticky_note(page, text) # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text) # add a text watermark
pdf_page_watermark_confidential(page) # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page) # add a DRAFT watermark
pdf_page_stamp(page, type_name) # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation
Widgets de AcroForm
pdf_page_text_field(page, name, x, y, w, h, default_value = NULL) # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE) # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL) # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL) # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption) # push button
pdf_page_signature_field(page, name, x, y, w, h) # signature field
Códigos de barras (page builder)
pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h) # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size) # draw a QR code
Imagens
pdf_page_image(page, bytes, x, y, w, h) # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text) # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h) # place an image tagged as an artifact
Gráficos vetoriais
pdf_page_rect(page, x, y, w, h) # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b) # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2) # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b) # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b) # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0) # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0) # dashed line
Tabelas
pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
n_columns = length(widths), n_rows = NULL) # render a static table
Tabelas em streaming para dados grandes/incrementais.
pdf_page_streaming_table_begin(page, headers, widths, aligns,
repeat_header = FALSE, n_columns = length(headers)) # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
repeat_header = FALSE, mode = 0L, sample_rows = 0L,
min_col_width_pt = 0, max_col_width_pt = 0,
max_rowspan = 0L, n_columns = length(headers)) # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size) # set the flush batch size
pdf_page_streaming_table_pending_row_count(page) # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page) # number of flushed batches
pdf_page_streaming_table_flush(page) # flush buffered rows
pdf_page_streaming_table_push_row(page, cells) # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL) # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page) # finish and lay out the streaming table
Assinaturas digitais
Certificados
pdf_certificate_load_from_bytes(bytes, password = NULL) # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem) # load a certificate + key from PEM
pdf_certificate_subject(cert) # certificate subject DN
pdf_certificate_issuer(cert) # certificate issuer DN
pdf_certificate_serial(cert) # certificate serial number
pdf_certificate_validity(cert) # validity window
pdf_certificate_is_valid(cert) # TRUE if currently within the validity window
pdf_certificate_close(cert) # free a certificate handle
Assinatura
pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL) # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...) # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...) # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL) # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url) # add a TSA timestamp to a signature in bytes
Inspeção e verificação de assinatura
pdf_signature_count(doc) # number of signatures
pdf_get_signature(doc, index) # signature handle by index
pdf_signature_signer_name(sig) # signer common name
pdf_signature_signing_reason(sig) # signing reason
pdf_signature_signing_location(sig) # signing location
pdf_signature_signing_time(sig) # signing time
pdf_signature_certificate(sig) # signer certificate handle
pdf_signature_pades_level(sig) # PAdES level of the signature
pdf_signature_has_timestamp(sig) # TRUE if the signature is timestamped
pdf_signature_timestamp(sig) # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp) # attach a timestamp to a signature
pdf_signature_verify(sig) # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf) # verify with a detached message digest check
pdf_signature_close(sig) # free a signature handle
pdf_verify_all_signatures(doc) # verify every signature in the document
Carimbos de tempo
pdf_timestamp_parse(bytes) # parse a timestamp token (TST)
pdf_timestamp_token(timestamp) # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp) # timestamp time
pdf_timestamp_serial(timestamp) # timestamp serial number
pdf_timestamp_tsa_name(timestamp) # TSA name
pdf_timestamp_policy_oid(timestamp) # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp) # hash algorithm used
pdf_timestamp_verify(timestamp) # verify the timestamp token
pdf_timestamp_close(timestamp) # free a timestamp handle
Cliente TSA
pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE) # create a TSA client
pdf_tsa_request_timestamp(client, data) # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L) # request a timestamp over a precomputed hash
pdf_tsa_client_close(client) # free a TSA client
Document Security Store (DSS)
pdf_get_dss(doc) # get the document's DSS handle
pdf_dss_cert_count(dss) # number of certificates in the DSS
pdf_dss_crl_count(dss) # number of CRLs
pdf_dss_ocsp_count(dss) # number of OCSP responses
pdf_dss_vri_count(dss) # number of VRI entries
pdf_dss_get_cert(dss, index) # one DSS certificate
pdf_dss_get_crl(dss, index) # one DSS CRL
pdf_dss_get_ocsp(dss, index) # one DSS OCSP response
pdf_dss_close(dss) # free a DSS handle
Validação de conformidade
PDF/A
pdf_validate_pdf_a(doc, level = 0L) # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results) # TRUE if compliant
pdf_a_errors(results) # list of validation errors
pdf_a_warning_count(results) # number of warnings
pdf_a_results_close(results) # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes
PDF/UA (acessibilidade)
pdf_validate_pdf_ua(doc, level = 0L) # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results) # TRUE if accessible
pdf_ua_errors(results) # list of accessibility errors
pdf_ua_warnings(results) # list of accessibility warnings
pdf_ua_stats(results) # accessibility statistics
pdf_ua_results_close(results) # free the results handle
PDF/X (impressão)
pdf_validate_pdf_x(doc, level = 0L) # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results) # TRUE if compliant
pdf_x_errors(results) # list of validation errors
pdf_x_results_close(results) # free the results handle
Códigos de barras
Geração e decodificação independente de códigos de barras.
pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L) # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L) # generate a barcode in a given format
pdf_barcode_get_data(barcode) # decoded data string
pdf_barcode_get_format(barcode) # barcode format
pdf_barcode_get_confidence(barcode) # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L) # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L) # rendered SVG string
pdf_barcode_close(barcode) # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...) # stamp a barcode onto an editor page
OCR
Requer a feature ocr na build subjacente.
pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path) # build an OCR engine from model paths
pdf_ocr_engine_close(engine) # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page) # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL) # OCR a page (uses the default engine when NULL)
Modelos de OCR e configuração de runtime
pdf_model_manifest() # available OCR model manifest
pdf_prefetch_available() # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL) # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit) # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping
Provedor criptográfico / FIPS
pdf_crypto_active_provider() # name of the active crypto provider
pdf_crypto_fips_available() # TRUE if a FIPS provider is available
pdf_crypto_use_fips() # switch to the FIPS provider
pdf_crypto_set_policy(spec) # set the crypto policy from a spec string
pdf_crypto_policy() # current crypto policy
pdf_crypto_inventory() # cryptographic algorithm inventory
pdf_crypto_cbom() # Cryptographic Bill of Materials (CBOM)
Logging
pdf_set_log_level(level) # set the library log level
pdf_get_log_level() # get the current log level
Exemplo completo
library(pdfoxide)
# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)
# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")
for (i in seq_len(pdf_page_count(doc)) - 1L) { # 0-based indices
txt <- pdf_extract_text(doc, i)
cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}
chars <- pdf_extract_chars(doc, 0) # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)
# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)
# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)
Other Language Bindings
O PDF Oxide oferece bindings nativos para todos os principais ecossistemas: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C e Elixir.
Próximos passos
- Tipos e Enums — todos os tipos e enums compartilhados
- Referência da API Page — iteração consistente por página entre os bindings
- Primeiros Passos com R — tutorial