Referencia de la API de R
PDF Oxide incluye bindings idiomáticos para R distribuidos como el paquete pdfoxide. El paquete envuelve la C ABI de pdf_oxide a través de la interfaz nativa .Call de R, por lo que no hay dependencia de Java, Python ni de ningún runtime externo: solo una biblioteca compartida compilada.
# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide
library(pdfoxide)
Para la API de Rust, consulta la Referencia de la API de Rust. Para Python, consulta la Referencia de la API de Python. Para JavaScript, consulta la Referencia de la API de Node.js o la Referencia de la API WASM.
La API de R es una API funcional plana construida alrededor de objetos handle opacos en lugar de clases R6/S4. Los principales tipos de handle son:
| Handle | Creado por | Propósito |
|---|---|---|
pdfoxide_pdf |
pdf_from_markdown(), pdf_from_html(), … |
Un PDF recién construido, listo para guardar o convertir |
pdfoxide_document |
pdf_open(), pdf_open_from_bytes() |
Un PDF cargado para extracción y renderizado de solo lectura |
pdfoxide_editor |
pdf_editor_open() |
Un PDF mutable para editar, fusionar y guardar |
pdfoxide_builder |
pdf_builder_create() |
Un DocumentBuilder para construcción programática de páginas |
pdfoxide_page (builder) |
pdf_builder_page(), pdf_builder_a4_page(), … |
Una página fluida en proceso de maquetación |
pdfoxide_page (lazy) |
pdf_page() |
Un handle de lectura perezoso (lazy) sobre una sola página del documento |
pdfoxide_renderer, pdfoxide_rendered_image |
funciones de renderizado | Renderer reutilizable y salida ráster renderizada |
pdfoxide_certificate, pdfoxide_signature, pdfoxide_timestamp, pdfoxide_tsa_client, pdfoxide_dss |
funciones de firma / verificación | Primitivas de firma digital |
Todos los índices de página son base 0. Las funciones que modifican un builder/page/editor devuelven el handle de forma invisible, de modo que las llamadas se pueden encadenar con el pipe (|>). Los handles se cierran automáticamente al pasar por el recolector de basura, pero pdf_close() / *_close() los liberan de forma anticipada.
Creación de PDF
Creación rápida en un solo paso a partir de un formato de origen. Cada función devuelve un handle pdfoxide_pdf.
pdf_from_markdown(markdown) # build a PDF from a Markdown string
pdf_from_html(html) # build a PDF from an HTML string
pdf_from_text(text) # build a PDF from plain text
pdf_from_image(path) # build a single-page PDF from an image file
pdf_from_image_bytes(bytes) # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL) # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes) # HTML + CSS with multiple named font families
pdf_merge(paths) # merge several PDF files into one new PDF
Guardar / serializar un PDF construido
pdf_save(pdf, path) # write the PDF to a file path
pdf_to_bytes(pdf) # serialize the PDF to a raw vector
pdf_get_page_count(pdf) # number of pages in a built pdfoxide_pdf
Abrir documentos
Abre un PDF existente para extracción y renderizado. Devuelve un pdfoxide_document.
pdf_open(path) # open a PDF file from disk
pdf_open_with_password(path, password) # open an encrypted PDF with a password
pdf_open_from_bytes(bytes) # open a PDF from an in-memory raw vector
pdf_close(x) # close any pdfoxide handle and free it
Apertura desde formatos de Office
Convierte y abre directamente un documento de Word/PowerPoint/Excel como un pdfoxide_document.
pdf_open_from_docx_bytes(bytes) # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes) # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes) # convert XLSX bytes and open as a document
Inspección del documento
pdf_page_count(doc) # number of pages
pdf_version(doc) # PDF version as a list(major, minor)
pdf_is_encrypted(doc) # TRUE if the document is encrypted
pdf_has_structure_tree(doc) # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password) # authenticate an encrypted document after opening
pdf_has_xfa(doc) # TRUE if the document contains XFA forms
pdf_has_timestamp(doc) # TRUE if the document carries a document timestamp
Extracción de texto y contenido
Extracción de una sola página (el índice de página es base 0).
pdf_extract_text(doc, page) # reading-order plain text for one page
pdf_to_plain_text(doc, page) # layout-aware plain text for one page
pdf_to_markdown(doc, page) # Markdown for one page
pdf_to_html(doc, page) # HTML for one page
pdf_extract_structured_json(doc, page) # structured layout JSON for one page
Extracción del documento completo.
pdf_to_markdown_all(doc) # Markdown for the entire document
pdf_to_html_all(doc) # HTML for the entire document
pdf_to_plain_text_all(doc) # plain text for the entire document
pdf_extract_all_text(doc) # concatenated reading-order text for all pages
Extracción estructurada / por elemento. Estas funciones devuelven data frames o listas de registros.
pdf_extract_chars(doc, page) # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page) # word records with bounding boxes
pdf_extract_text_lines(doc, page) # text-line records with bounding boxes
pdf_extract_tables(doc, page) # detected tables with rows and cells
pdf_extract_paths(doc, page) # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page) # embedded font records used on a page
pdf_embedded_images(doc, page) # embedded image records on a page
pdf_page_annotations(doc, page) # annotation records on a page
Extracción con autodetección (elige entre heurísticas nativas y de tipo OCR).
pdf_extract_text_auto(doc, page) # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL) # best-effort structured page extraction
Extracción por región (rectángulo de recorte)
Restringe la extracción a un rectángulo en puntos PDF (origen en la esquina inferior izquierda).
pdf_extract_text_in_rect(doc, page, x, y, width, height) # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height) # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height) # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height) # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height) # images inside a rectangle
Handles de página perezosos (lazy)
pdf_page() devuelve un pdfoxide_page ligero vinculado a una sola página; los getters de texto extraen el contenido bajo demanda.
pdf_page(doc, index) # lazy handle for one page
pdf_page_text(page) # plain text of the page
pdf_page_markdown(page) # Markdown of the page
pdf_page_html(page) # HTML of the page
pdf_page_plain_text(page) # layout-aware plain text of the page
Geometría de página y elementos en bruto
pdf_page_get_width(doc, page) # page width in PDF points
pdf_page_get_height(doc, page) # page height in PDF points
pdf_page_get_rotation(doc, page) # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page) # raw element records for the page
Búsqueda
pdf_search(doc, page, term, case_sensitive = FALSE) # search one page
pdf_search_all(doc, term, case_sensitive = FALSE) # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE) # page search results as JSON
Clasificación y limpieza de páginas
Detecta y elimina encabezados, pies de página y artefactos repetidos.
pdf_classify_page(doc, page) # classify the layout/content of one page
pdf_classify_document(doc) # classify the whole document
pdf_remove_headers(doc, threshold = 0.5) # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5) # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5) # detect and remove page artifacts
pdf_erase_header(doc, page) # erase the header region on a page
pdf_erase_footer(doc, page) # erase the footer region on a page
pdf_erase_artifacts(doc, page) # erase artifact regions on a page
Conversión a Office (exportación)
Convierte un PDF cargado de vuelta a un formato de Office. Devuelve un vector raw.
pdf_to_docx(doc) # convert the document to DOCX bytes
pdf_to_pptx(doc) # convert the document to PPTX bytes
pdf_to_xlsx(doc) # convert the document to XLSX bytes
Formularios
pdf_get_form_fields(doc) # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L) # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path) # import form data from a file path
pdf_form_import_from_file(doc, filename) # import form data from a named file
Los helpers de formularios del lado del editor se describen en Edición de PDF.
Estructura y metadatos del documento
pdf_get_outline(doc) # document outline / bookmarks tree
pdf_get_page_labels(doc) # page-label ranges
pdf_get_xmp_metadata(doc) # XMP metadata as a list
pdf_get_source_bytes(doc) # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL) # plan a split of the document by top-level bookmarks
Detalle de anotaciones
Inspecciona anotaciones individuales por página e índice.
pdf_annotation_get_color(doc, page, index) # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index) # creation date string
pdf_annotation_get_modification_date(doc, page, index) # modification date string
pdf_annotation_is_hidden(doc, page, index) # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index) # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index) # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index) # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index) # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index) # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index) # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index) # one highlight quad point
pdf_annotations_to_json(doc, page) # all annotations on a page as JSON
Helpers JSON para fuentes y elementos
pdf_font_get_size(doc, page, index) # size of a font record on a page
pdf_fonts_to_json(doc, page) # page fonts as JSON
pdf_elements_to_json(doc, page) # page elements as JSON
Renderizado
Renderiza páginas a imágenes ráster. format: 0 = PNG, 1 = JPEG. Las coordenadas y el DPI se documentan por cada función.
pdf_render_page(doc, page, format = 0L) # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L) # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L) # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L) # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L) # render a sub-region
Superficie completa de RenderOptions (RGBA de fondo, transparencia, activación de anotaciones, calidad JPEG, exclusión de capas).
pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L)
pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L,
excluded_layers = NULL)
Renderer reutilizable y estimación de tiempos.
pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE) # build a reusable renderer
pdf_renderer_close(renderer) # free a renderer
pdf_estimate_render_time(doc, page) # estimate render time for a page
Helpers del handle de imagen renderizada.
pdf_rendered_image_save(image, path) # write a rendered image to a file
pdf_rendered_image_close(image) # free a rendered image
Edición de PDF
Abre un PDF para modificarlo. Devuelve un pdfoxide_editor.
pdf_editor_open(path) # open a PDF for editing
pdf_editor_open_from_bytes(bytes) # open an editor from a raw vector
pdf_editor_close(editor) # close the editor and free it
Inspección y metadatos del editor
pdf_editor_page_count(editor) # page count
pdf_editor_version(editor) # PDF version as list(major, minor)
pdf_editor_is_modified(editor) # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor) # original source path, if any
pdf_editor_get_producer(editor) # Producer metadata string
pdf_editor_set_producer(editor, value) # set the Producer metadata
pdf_editor_get_creation_date(editor) # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate
Operaciones de página
pdf_editor_delete_page(editor, page) # delete a page
pdf_editor_move_page(editor, from, to) # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees) # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees) # rotate every page
pdf_editor_get_page_rotation(editor, page) # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees) # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom) # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page) # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h) # set CropBox
pdf_editor_get_page_media_box(editor, page) # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox
Redacción (editor)
pdf_editor_apply_all_redactions(editor) # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page) # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page) # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page) # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h) # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects) # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page) # clear pending erase regions
Flujo de redacción independiente
pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0) # add a redaction box with a fill color
pdf_redaction_count(editor, page) # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0) # burn in all redactions
pdf_redaction_scrub_metadata(editor) # scrub metadata for redaction hygiene
Formularios y anotaciones (editor)
pdf_editor_flatten_forms(editor) # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page) # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value) # set a form-field value by name
pdf_editor_flatten_annotations(editor, page) # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor) # flatten all annotations
pdf_editor_flatten_warnings_count(editor) # number of flatten warnings
pdf_editor_flatten_warning(editor, index) # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page) # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page) # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes) # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes) # import XFDF form data
Operaciones sobre el documento (editor)
pdf_editor_merge_from(editor, source_path) # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes) # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level) # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes) # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages) # extract selected pages to a new PDF (bytes)
Guardado (editor)
pdf_editor_save(editor, path) # save to a file
pdf_editor_save_to_bytes(editor) # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
garbage_collect = TRUE, linearize = FALSE) # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password) # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password) # save AES-encrypted to bytes
DocumentBuilder (creación programática)
Construye un PDF página por página. pdf_builder_create() devuelve un pdfoxide_builder; los constructores de página devuelven un pdfoxide_page fluido.
pdf_builder_create() # start a new DocumentBuilder
pdf_builder_close(builder) # free a builder
Metadatos del documento del builder
pdf_builder_set_title(builder, value) # set document title
pdf_builder_set_author(builder, value) # set document author
pdf_builder_set_subject(builder, value) # set document subject
pdf_builder_set_keywords(builder, value) # set document keywords
pdf_builder_set_creator(builder, value) # set document creator
pdf_builder_on_open(builder, script) # set a document-open JavaScript action
pdf_builder_language(builder, lang) # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder) # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard) # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font) # register an embedded font for use on pages
Páginas y salida del builder
pdf_builder_page(builder, width, height) # start a custom-size page
pdf_builder_a4_page(builder) # start an A4 page
pdf_builder_letter_page(builder) # start a US Letter page
pdf_builder_build(builder) # finish and return the PDF as bytes
pdf_builder_save(builder, path) # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password) # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password) # finish and return encrypted bytes
Fuentes embebidas
pdf_embedded_font_from_file(path) # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL) # load an embedded font from bytes
pdf_embedded_font_close(font) # free an embedded font handle
Page builder (maquetación fluida)
Todas las siguientes funciones operan sobre un pdfoxide_page devuelto por pdf_builder_page() y devuelven la página de forma invisible para poder encadenarlas. Finaliza una página con pdf_page_done().
Flujo de texto y tipografía
pdf_page_font(page, name, size) # set the active font and size
pdf_page_at(page, x, y) # move the text cursor to a coordinate
pdf_page_builder_text(page, text) # draw text at the cursor
pdf_page_heading(page, level, text) # add a heading (level 1-6)
pdf_page_paragraph(page, text) # add a wrapped paragraph
pdf_page_space(page, points) # add vertical space
pdf_page_horizontal_rule(page) # draw a horizontal rule
pdf_page_newline(page) # advance to the next line
pdf_page_footnote(page, ref_mark, note_text) # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text) # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L) # flow text inside a rectangle
pdf_page_new_page_same_size(page) # start a new page of the same size
pdf_page_done(page) # finish the page and return to the builder
pdf_page_close(page) # free a page handle
Fragmentos de texto en línea con estilo
pdf_page_inline(page, text) # append an inline text run
pdf_page_inline_bold(page, text) # append a bold inline run
pdf_page_inline_italic(page, text) # append an italic inline run
pdf_page_inline_color(page, r, g, b, text) # append a colored inline run
Enlaces y acciones JavaScript
pdf_page_link_url(page, url) # add a URL link
pdf_page_link_page(page, index) # add an internal page link
pdf_page_link_named(page, destination) # add a named-destination link
pdf_page_link_javascript(page, script) # add a JavaScript-action link
pdf_page_on_open(page, script) # page-open JavaScript action
pdf_page_on_close(page, script) # page-close JavaScript action
pdf_page_field_keystroke(page, script) # field keystroke JavaScript action
pdf_page_field_format(page, script) # field format JavaScript action
pdf_page_field_validate(page, script) # field validate JavaScript action
pdf_page_field_calculate(page, script) # field calculate JavaScript action
Anotaciones y marcado
pdf_page_highlight(page, r, g, b) # highlight markup at the current run
pdf_page_underline(page, r, g, b) # underline markup
pdf_page_strikeout(page, r, g, b) # strikeout markup
pdf_page_squiggly(page, r, g, b) # squiggly underline markup
pdf_page_sticky_note(page, text) # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text) # add a text watermark
pdf_page_watermark_confidential(page) # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page) # add a DRAFT watermark
pdf_page_stamp(page, type_name) # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation
Widgets de AcroForm
pdf_page_text_field(page, name, x, y, w, h, default_value = NULL) # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE) # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL) # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL) # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption) # push button
pdf_page_signature_field(page, name, x, y, w, h) # signature field
Códigos de barras (page builder)
pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h) # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size) # draw a QR code
Imágenes
pdf_page_image(page, bytes, x, y, w, h) # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text) # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h) # place an image tagged as an artifact
Gráficos vectoriales
pdf_page_rect(page, x, y, w, h) # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b) # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2) # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b) # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b) # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0) # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0) # dashed line
Tablas
pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
n_columns = length(widths), n_rows = NULL) # render a static table
Tablas en streaming para datos grandes/incrementales.
pdf_page_streaming_table_begin(page, headers, widths, aligns,
repeat_header = FALSE, n_columns = length(headers)) # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
repeat_header = FALSE, mode = 0L, sample_rows = 0L,
min_col_width_pt = 0, max_col_width_pt = 0,
max_rowspan = 0L, n_columns = length(headers)) # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size) # set the flush batch size
pdf_page_streaming_table_pending_row_count(page) # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page) # number of flushed batches
pdf_page_streaming_table_flush(page) # flush buffered rows
pdf_page_streaming_table_push_row(page, cells) # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL) # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page) # finish and lay out the streaming table
Firmas digitales
Certificados
pdf_certificate_load_from_bytes(bytes, password = NULL) # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem) # load a certificate + key from PEM
pdf_certificate_subject(cert) # certificate subject DN
pdf_certificate_issuer(cert) # certificate issuer DN
pdf_certificate_serial(cert) # certificate serial number
pdf_certificate_validity(cert) # validity window
pdf_certificate_is_valid(cert) # TRUE if currently within the validity window
pdf_certificate_close(cert) # free a certificate handle
Firma
pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL) # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...) # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...) # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL) # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url) # add a TSA timestamp to a signature in bytes
Inspección y verificación de firmas
pdf_signature_count(doc) # number of signatures
pdf_get_signature(doc, index) # signature handle by index
pdf_signature_signer_name(sig) # signer common name
pdf_signature_signing_reason(sig) # signing reason
pdf_signature_signing_location(sig) # signing location
pdf_signature_signing_time(sig) # signing time
pdf_signature_certificate(sig) # signer certificate handle
pdf_signature_pades_level(sig) # PAdES level of the signature
pdf_signature_has_timestamp(sig) # TRUE if the signature is timestamped
pdf_signature_timestamp(sig) # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp) # attach a timestamp to a signature
pdf_signature_verify(sig) # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf) # verify with a detached message digest check
pdf_signature_close(sig) # free a signature handle
pdf_verify_all_signatures(doc) # verify every signature in the document
Sellos de tiempo
pdf_timestamp_parse(bytes) # parse a timestamp token (TST)
pdf_timestamp_token(timestamp) # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp) # timestamp time
pdf_timestamp_serial(timestamp) # timestamp serial number
pdf_timestamp_tsa_name(timestamp) # TSA name
pdf_timestamp_policy_oid(timestamp) # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp) # hash algorithm used
pdf_timestamp_verify(timestamp) # verify the timestamp token
pdf_timestamp_close(timestamp) # free a timestamp handle
Cliente TSA
pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE) # create a TSA client
pdf_tsa_request_timestamp(client, data) # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L) # request a timestamp over a precomputed hash
pdf_tsa_client_close(client) # free a TSA client
Document Security Store (DSS)
pdf_get_dss(doc) # get the document's DSS handle
pdf_dss_cert_count(dss) # number of certificates in the DSS
pdf_dss_crl_count(dss) # number of CRLs
pdf_dss_ocsp_count(dss) # number of OCSP responses
pdf_dss_vri_count(dss) # number of VRI entries
pdf_dss_get_cert(dss, index) # one DSS certificate
pdf_dss_get_crl(dss, index) # one DSS CRL
pdf_dss_get_ocsp(dss, index) # one DSS OCSP response
pdf_dss_close(dss) # free a DSS handle
Validación de cumplimiento
PDF/A
pdf_validate_pdf_a(doc, level = 0L) # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results) # TRUE if compliant
pdf_a_errors(results) # list of validation errors
pdf_a_warning_count(results) # number of warnings
pdf_a_results_close(results) # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes
PDF/UA (accesibilidad)
pdf_validate_pdf_ua(doc, level = 0L) # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results) # TRUE if accessible
pdf_ua_errors(results) # list of accessibility errors
pdf_ua_warnings(results) # list of accessibility warnings
pdf_ua_stats(results) # accessibility statistics
pdf_ua_results_close(results) # free the results handle
PDF/X (impresión)
pdf_validate_pdf_x(doc, level = 0L) # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results) # TRUE if compliant
pdf_x_errors(results) # list of validation errors
pdf_x_results_close(results) # free the results handle
Códigos de barras
Generación y decodificación de códigos de barras independientes.
pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L) # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L) # generate a barcode in a given format
pdf_barcode_get_data(barcode) # decoded data string
pdf_barcode_get_format(barcode) # barcode format
pdf_barcode_get_confidence(barcode) # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L) # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L) # rendered SVG string
pdf_barcode_close(barcode) # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...) # stamp a barcode onto an editor page
OCR
Requiere la feature ocr en la build subyacente.
pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path) # build an OCR engine from model paths
pdf_ocr_engine_close(engine) # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page) # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL) # OCR a page (uses the default engine when NULL)
Modelos de OCR y configuración en tiempo de ejecución
pdf_model_manifest() # available OCR model manifest
pdf_prefetch_available() # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL) # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit) # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping
Proveedor criptográfico / FIPS
pdf_crypto_active_provider() # name of the active crypto provider
pdf_crypto_fips_available() # TRUE if a FIPS provider is available
pdf_crypto_use_fips() # switch to the FIPS provider
pdf_crypto_set_policy(spec) # set the crypto policy from a spec string
pdf_crypto_policy() # current crypto policy
pdf_crypto_inventory() # cryptographic algorithm inventory
pdf_crypto_cbom() # Cryptographic Bill of Materials (CBOM)
Logging
pdf_set_log_level(level) # set the library log level
pdf_get_log_level() # get the current log level
Ejemplo completo
library(pdfoxide)
# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)
# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")
for (i in seq_len(pdf_page_count(doc)) - 1L) { # 0-based indices
txt <- pdf_extract_text(doc, i)
cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}
chars <- pdf_extract_chars(doc, 0) # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)
# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)
# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)
Other Language Bindings
PDF Oxide ofrece bindings nativos para todos los ecosistemas principales: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C y Elixir.
Siguientes pasos
- Tipos y enums — todos los tipos y enums compartidos
- Referencia de la API Page — iteración de página consistente entre bindings
- Primeros pasos con R — tutorial