R API 参考
PDF Oxide 以 pdfoxide 包的形式提供符合 R 习惯的绑定。该包通过 R 原生的 .Call 接口封装了 pdf_oxide 的 C ABI,因此不依赖 Java、Python 或其他外部运行时——只需要一个编译好的共享库。
# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide
library(pdfoxide)
Rust API 请参阅 Rust API 参考;Python 请参阅 Python API 参考;JavaScript 请参阅 Node.js API 参考 或 WASM API 参考。
R API 是一个扁平的函数式 API,围绕不透明的句柄对象构建,而非 R6/S4 类。主要的句柄类型如下:
| 句柄 | 创建方式 | 用途 |
|---|---|---|
pdfoxide_pdf |
pdf_from_markdown()、pdf_from_html()、… |
一份新构建的 PDF,可直接保存或转换 |
pdfoxide_document |
pdf_open()、pdf_open_from_bytes() |
已加载的 PDF,用于只读提取和渲染 |
pdfoxide_editor |
pdf_editor_open() |
可变的 PDF,用于编辑、合并和保存 |
pdfoxide_builder |
pdf_builder_create() |
用于以编程方式构建页面的 DocumentBuilder |
pdfoxide_page(构建器) |
pdf_builder_page()、pdf_builder_a4_page()、… |
正在流式排版的页面 |
pdfoxide_page(惰性) |
pdf_page() |
单个文档页面的惰性读取句柄 |
pdfoxide_renderer、pdfoxide_rendered_image |
渲染相关函数 | 可复用的渲染器和渲染输出的栅格图像 |
pdfoxide_certificate、pdfoxide_signature、pdfoxide_timestamp、pdfoxide_tsa_client、pdfoxide_dss |
签名 / 验证相关函数 | 数字签名相关的基础对象 |
所有页索引都是从 0 开始的。会修改 builder/page/editor 的函数会隐式返回该句柄,因此可以用管道操作符(|>)将调用串联起来。句柄会在垃圾回收时自动关闭,但 pdf_close() / *_close() 可以更主动地立即释放它们。
创建 PDF
从某种源格式快速一次性创建 PDF。每个函数都返回一个 pdfoxide_pdf 句柄。
pdf_from_markdown(markdown) # build a PDF from a Markdown string
pdf_from_html(html) # build a PDF from an HTML string
pdf_from_text(text) # build a PDF from plain text
pdf_from_image(path) # build a single-page PDF from an image file
pdf_from_image_bytes(bytes) # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL) # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes) # HTML + CSS with multiple named font families
pdf_merge(paths) # merge several PDF files into one new PDF
保存 / 序列化已构建的 PDF
pdf_save(pdf, path) # write the PDF to a file path
pdf_to_bytes(pdf) # serialize the PDF to a raw vector
pdf_get_page_count(pdf) # number of pages in a built pdfoxide_pdf
打开文档
打开一个现有 PDF 用于提取和渲染。返回一个 pdfoxide_document。
pdf_open(path) # open a PDF file from disk
pdf_open_with_password(path, password) # open an encrypted PDF with a password
pdf_open_from_bytes(bytes) # open a PDF from an in-memory raw vector
pdf_close(x) # close any pdfoxide handle and free it
从 Office 格式打开
直接将 Word/PowerPoint/Excel 文档转换并打开为 pdfoxide_document。
pdf_open_from_docx_bytes(bytes) # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes) # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes) # convert XLSX bytes and open as a document
文档检查
pdf_page_count(doc) # number of pages
pdf_version(doc) # PDF version as a list(major, minor)
pdf_is_encrypted(doc) # TRUE if the document is encrypted
pdf_has_structure_tree(doc) # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password) # authenticate an encrypted document after opening
pdf_has_xfa(doc) # TRUE if the document contains XFA forms
pdf_has_timestamp(doc) # TRUE if the document carries a document timestamp
文本与内容提取
单页提取(页索引从 0 开始)。
pdf_extract_text(doc, page) # reading-order plain text for one page
pdf_to_plain_text(doc, page) # layout-aware plain text for one page
pdf_to_markdown(doc, page) # Markdown for one page
pdf_to_html(doc, page) # HTML for one page
pdf_extract_structured_json(doc, page) # structured layout JSON for one page
整文档提取。
pdf_to_markdown_all(doc) # Markdown for the entire document
pdf_to_html_all(doc) # HTML for the entire document
pdf_to_plain_text_all(doc) # plain text for the entire document
pdf_extract_all_text(doc) # concatenated reading-order text for all pages
结构化 / 逐元素提取。这些函数返回数据框或记录列表。
pdf_extract_chars(doc, page) # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page) # word records with bounding boxes
pdf_extract_text_lines(doc, page) # text-line records with bounding boxes
pdf_extract_tables(doc, page) # detected tables with rows and cells
pdf_extract_paths(doc, page) # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page) # embedded font records used on a page
pdf_embedded_images(doc, page) # embedded image records on a page
pdf_page_annotations(doc, page) # annotation records on a page
自动检测式提取(在原生提取和类 OCR 启发式之间自动选择)。
pdf_extract_text_auto(doc, page) # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL) # best-effort structured page extraction
区域(裁剪矩形)提取
将提取范围限定在一个矩形区域内,坐标单位为 PDF 点(原点在左下角)。
pdf_extract_text_in_rect(doc, page, x, y, width, height) # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height) # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height) # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height) # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height) # images inside a rectangle
惰性页面句柄
pdf_page() 返回一个轻量的 pdfoxide_page,绑定到单个页面;文本获取函数会按需提取。
pdf_page(doc, index) # lazy handle for one page
pdf_page_text(page) # plain text of the page
pdf_page_markdown(page) # Markdown of the page
pdf_page_html(page) # HTML of the page
pdf_page_plain_text(page) # layout-aware plain text of the page
页面几何信息与原始元素
pdf_page_get_width(doc, page) # page width in PDF points
pdf_page_get_height(doc, page) # page height in PDF points
pdf_page_get_rotation(doc, page) # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page) # raw element records for the page
搜索
pdf_search(doc, page, term, case_sensitive = FALSE) # search one page
pdf_search_all(doc, term, case_sensitive = FALSE) # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE) # page search results as JSON
页面分类与清理
检测并剥离重复出现的页眉、页脚和各类伪影。
pdf_classify_page(doc, page) # classify the layout/content of one page
pdf_classify_document(doc) # classify the whole document
pdf_remove_headers(doc, threshold = 0.5) # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5) # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5) # detect and remove page artifacts
pdf_erase_header(doc, page) # erase the header region on a page
pdf_erase_footer(doc, page) # erase the footer region on a page
pdf_erase_artifacts(doc, page) # erase artifact regions on a page
Office 转换(导出)
将已加载的 PDF 转换回 Office 格式。返回一个 raw 向量。
pdf_to_docx(doc) # convert the document to DOCX bytes
pdf_to_pptx(doc) # convert the document to PPTX bytes
pdf_to_xlsx(doc) # convert the document to XLSX bytes
表单
pdf_get_form_fields(doc) # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L) # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path) # import form data from a file path
pdf_form_import_from_file(doc, filename) # import form data from a named file
编辑器相关的表单辅助函数列在 编辑 PDF 一节中。
文档结构与元数据
pdf_get_outline(doc) # document outline / bookmarks tree
pdf_get_page_labels(doc) # page-label ranges
pdf_get_xmp_metadata(doc) # XMP metadata as a list
pdf_get_source_bytes(doc) # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL) # plan a split of the document by top-level bookmarks
注释详情
按页和索引检查单个注释。
pdf_annotation_get_color(doc, page, index) # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index) # creation date string
pdf_annotation_get_modification_date(doc, page, index) # modification date string
pdf_annotation_is_hidden(doc, page, index) # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index) # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index) # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index) # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index) # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index) # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index) # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index) # one highlight quad point
pdf_annotations_to_json(doc, page) # all annotations on a page as JSON
字体与元素 JSON 辅助函数
pdf_font_get_size(doc, page, index) # size of a font record on a page
pdf_fonts_to_json(doc, page) # page fonts as JSON
pdf_elements_to_json(doc, page) # page elements as JSON
渲染
将页面渲染为栅格图像。format:0 = PNG,1 = JPEG。坐标和 DPI 在各函数中分别说明。
pdf_render_page(doc, page, format = 0L) # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L) # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L) # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L) # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L) # render a sub-region
完整的 RenderOptions 选项集(背景 RGBA、透明度、注释开关、JPEG 质量、图层排除)。
pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L)
pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
transparent_background = FALSE,
render_annotations = TRUE, jpeg_quality = 85L,
excluded_layers = NULL)
可复用渲染器和耗时估算。
pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE) # build a reusable renderer
pdf_renderer_close(renderer) # free a renderer
pdf_estimate_render_time(doc, page) # estimate render time for a page
渲染图像句柄的辅助函数。
pdf_rendered_image_save(image, path) # write a rendered image to a file
pdf_rendered_image_close(image) # free a rendered image
编辑 PDF
打开一个 PDF 进行修改。返回一个 pdfoxide_editor。
pdf_editor_open(path) # open a PDF for editing
pdf_editor_open_from_bytes(bytes) # open an editor from a raw vector
pdf_editor_close(editor) # close the editor and free it
编辑器检查与元数据
pdf_editor_page_count(editor) # page count
pdf_editor_version(editor) # PDF version as list(major, minor)
pdf_editor_is_modified(editor) # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor) # original source path, if any
pdf_editor_get_producer(editor) # Producer metadata string
pdf_editor_set_producer(editor, value) # set the Producer metadata
pdf_editor_get_creation_date(editor) # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate
页面操作
pdf_editor_delete_page(editor, page) # delete a page
pdf_editor_move_page(editor, from, to) # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees) # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees) # rotate every page
pdf_editor_get_page_rotation(editor, page) # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees) # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom) # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page) # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h) # set CropBox
pdf_editor_get_page_media_box(editor, page) # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox
编辑器涂黑(Redaction)
pdf_editor_apply_all_redactions(editor) # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page) # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page) # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page) # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h) # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects) # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page) # clear pending erase regions
独立涂黑工作流
pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0) # add a redaction box with a fill color
pdf_redaction_count(editor, page) # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0) # burn in all redactions
pdf_redaction_scrub_metadata(editor) # scrub metadata for redaction hygiene
表单与注释(编辑器)
pdf_editor_flatten_forms(editor) # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page) # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value) # set a form-field value by name
pdf_editor_flatten_annotations(editor, page) # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor) # flatten all annotations
pdf_editor_flatten_warnings_count(editor) # number of flatten warnings
pdf_editor_flatten_warning(editor, index) # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page) # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page) # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes) # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes) # import XFDF form data
文档操作(编辑器)
pdf_editor_merge_from(editor, source_path) # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes) # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level) # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes) # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages) # extract selected pages to a new PDF (bytes)
保存(编辑器)
pdf_editor_save(editor, path) # save to a file
pdf_editor_save_to_bytes(editor) # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
garbage_collect = TRUE, linearize = FALSE) # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password) # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password) # save AES-encrypted to bytes
DocumentBuilder(编程式创建)
逐页构建一个 PDF。pdf_builder_create() 返回一个 pdfoxide_builder;页面构造函数返回一个支持链式调用的 pdfoxide_page。
pdf_builder_create() # start a new DocumentBuilder
pdf_builder_close(builder) # free a builder
构建器文档元数据
pdf_builder_set_title(builder, value) # set document title
pdf_builder_set_author(builder, value) # set document author
pdf_builder_set_subject(builder, value) # set document subject
pdf_builder_set_keywords(builder, value) # set document keywords
pdf_builder_set_creator(builder, value) # set document creator
pdf_builder_on_open(builder, script) # set a document-open JavaScript action
pdf_builder_language(builder, lang) # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder) # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard) # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font) # register an embedded font for use on pages
构建器页面与输出
pdf_builder_page(builder, width, height) # start a custom-size page
pdf_builder_a4_page(builder) # start an A4 page
pdf_builder_letter_page(builder) # start a US Letter page
pdf_builder_build(builder) # finish and return the PDF as bytes
pdf_builder_save(builder, path) # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password) # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password) # finish and return encrypted bytes
嵌入字体
pdf_embedded_font_from_file(path) # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL) # load an embedded font from bytes
pdf_embedded_font_close(font) # free an embedded font handle
页面构建器(流式排版)
以下所有函数都作用于由 pdf_builder_page() 返回的 pdfoxide_page,并隐式返回该页面以便链式调用。用 pdf_page_done() 完成一个页面。
文本流与排版
pdf_page_font(page, name, size) # set the active font and size
pdf_page_at(page, x, y) # move the text cursor to a coordinate
pdf_page_builder_text(page, text) # draw text at the cursor
pdf_page_heading(page, level, text) # add a heading (level 1-6)
pdf_page_paragraph(page, text) # add a wrapped paragraph
pdf_page_space(page, points) # add vertical space
pdf_page_horizontal_rule(page) # draw a horizontal rule
pdf_page_newline(page) # advance to the next line
pdf_page_footnote(page, ref_mark, note_text) # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text) # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L) # flow text inside a rectangle
pdf_page_new_page_same_size(page) # start a new page of the same size
pdf_page_done(page) # finish the page and return to the builder
pdf_page_close(page) # free a page handle
行内样式文本
pdf_page_inline(page, text) # append an inline text run
pdf_page_inline_bold(page, text) # append a bold inline run
pdf_page_inline_italic(page, text) # append an italic inline run
pdf_page_inline_color(page, r, g, b, text) # append a colored inline run
链接与 JavaScript 动作
pdf_page_link_url(page, url) # add a URL link
pdf_page_link_page(page, index) # add an internal page link
pdf_page_link_named(page, destination) # add a named-destination link
pdf_page_link_javascript(page, script) # add a JavaScript-action link
pdf_page_on_open(page, script) # page-open JavaScript action
pdf_page_on_close(page, script) # page-close JavaScript action
pdf_page_field_keystroke(page, script) # field keystroke JavaScript action
pdf_page_field_format(page, script) # field format JavaScript action
pdf_page_field_validate(page, script) # field validate JavaScript action
pdf_page_field_calculate(page, script) # field calculate JavaScript action
注释与批注标记
pdf_page_highlight(page, r, g, b) # highlight markup at the current run
pdf_page_underline(page, r, g, b) # underline markup
pdf_page_strikeout(page, r, g, b) # strikeout markup
pdf_page_squiggly(page, r, g, b) # squiggly underline markup
pdf_page_sticky_note(page, text) # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text) # add a text watermark
pdf_page_watermark_confidential(page) # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page) # add a DRAFT watermark
pdf_page_stamp(page, type_name) # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation
AcroForm 控件
pdf_page_text_field(page, name, x, y, w, h, default_value = NULL) # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE) # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL) # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL) # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption) # push button
pdf_page_signature_field(page, name, x, y, w, h) # signature field
条形码(页面构建器)
pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h) # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size) # draw a QR code
图像
pdf_page_image(page, bytes, x, y, w, h) # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text) # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h) # place an image tagged as an artifact
矢量图形
pdf_page_rect(page, x, y, w, h) # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b) # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2) # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b) # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b) # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0) # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0) # dashed line
表格
pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
n_columns = length(widths), n_rows = NULL) # render a static table
用于大批量/增量数据的流式表格。
pdf_page_streaming_table_begin(page, headers, widths, aligns,
repeat_header = FALSE, n_columns = length(headers)) # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
repeat_header = FALSE, mode = 0L, sample_rows = 0L,
min_col_width_pt = 0, max_col_width_pt = 0,
max_rowspan = 0L, n_columns = length(headers)) # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size) # set the flush batch size
pdf_page_streaming_table_pending_row_count(page) # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page) # number of flushed batches
pdf_page_streaming_table_flush(page) # flush buffered rows
pdf_page_streaming_table_push_row(page, cells) # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL) # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page) # finish and lay out the streaming table
数字签名
证书
pdf_certificate_load_from_bytes(bytes, password = NULL) # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem) # load a certificate + key from PEM
pdf_certificate_subject(cert) # certificate subject DN
pdf_certificate_issuer(cert) # certificate issuer DN
pdf_certificate_serial(cert) # certificate serial number
pdf_certificate_validity(cert) # validity window
pdf_certificate_is_valid(cert) # TRUE if currently within the validity window
pdf_certificate_close(cert) # free a certificate handle
签名
pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL) # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...) # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...) # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL) # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url) # add a TSA timestamp to a signature in bytes
签名检查与验证
pdf_signature_count(doc) # number of signatures
pdf_get_signature(doc, index) # signature handle by index
pdf_signature_signer_name(sig) # signer common name
pdf_signature_signing_reason(sig) # signing reason
pdf_signature_signing_location(sig) # signing location
pdf_signature_signing_time(sig) # signing time
pdf_signature_certificate(sig) # signer certificate handle
pdf_signature_pades_level(sig) # PAdES level of the signature
pdf_signature_has_timestamp(sig) # TRUE if the signature is timestamped
pdf_signature_timestamp(sig) # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp) # attach a timestamp to a signature
pdf_signature_verify(sig) # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf) # verify with a detached message digest check
pdf_signature_close(sig) # free a signature handle
pdf_verify_all_signatures(doc) # verify every signature in the document
时间戳
pdf_timestamp_parse(bytes) # parse a timestamp token (TST)
pdf_timestamp_token(timestamp) # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp) # timestamp time
pdf_timestamp_serial(timestamp) # timestamp serial number
pdf_timestamp_tsa_name(timestamp) # TSA name
pdf_timestamp_policy_oid(timestamp) # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp) # hash algorithm used
pdf_timestamp_verify(timestamp) # verify the timestamp token
pdf_timestamp_close(timestamp) # free a timestamp handle
TSA 客户端
pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE) # create a TSA client
pdf_tsa_request_timestamp(client, data) # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L) # request a timestamp over a precomputed hash
pdf_tsa_client_close(client) # free a TSA client
文档安全存储(DSS)
pdf_get_dss(doc) # get the document's DSS handle
pdf_dss_cert_count(dss) # number of certificates in the DSS
pdf_dss_crl_count(dss) # number of CRLs
pdf_dss_ocsp_count(dss) # number of OCSP responses
pdf_dss_vri_count(dss) # number of VRI entries
pdf_dss_get_cert(dss, index) # one DSS certificate
pdf_dss_get_crl(dss, index) # one DSS CRL
pdf_dss_get_ocsp(dss, index) # one DSS OCSP response
pdf_dss_close(dss) # free a DSS handle
合规性校验
PDF/A
pdf_validate_pdf_a(doc, level = 0L) # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results) # TRUE if compliant
pdf_a_errors(results) # list of validation errors
pdf_a_warning_count(results) # number of warnings
pdf_a_results_close(results) # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes
PDF/UA(无障碍访问)
pdf_validate_pdf_ua(doc, level = 0L) # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results) # TRUE if accessible
pdf_ua_errors(results) # list of accessibility errors
pdf_ua_warnings(results) # list of accessibility warnings
pdf_ua_stats(results) # accessibility statistics
pdf_ua_results_close(results) # free the results handle
PDF/X(印刷)
pdf_validate_pdf_x(doc, level = 0L) # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results) # TRUE if compliant
pdf_x_errors(results) # list of validation errors
pdf_x_results_close(results) # free the results handle
条形码
独立的条形码生成与解码功能。
pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L) # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L) # generate a barcode in a given format
pdf_barcode_get_data(barcode) # decoded data string
pdf_barcode_get_format(barcode) # barcode format
pdf_barcode_get_confidence(barcode) # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L) # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L) # rendered SVG string
pdf_barcode_close(barcode) # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...) # stamp a barcode onto an editor page
OCR
需要底层构建开启 ocr 特性。
pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path) # build an OCR engine from model paths
pdf_ocr_engine_close(engine) # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page) # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL) # OCR a page (uses the default engine when NULL)
OCR 模型与运行时配置
pdf_model_manifest() # available OCR model manifest
pdf_prefetch_available() # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL) # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit) # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping
加密提供方 / FIPS
pdf_crypto_active_provider() # name of the active crypto provider
pdf_crypto_fips_available() # TRUE if a FIPS provider is available
pdf_crypto_use_fips() # switch to the FIPS provider
pdf_crypto_set_policy(spec) # set the crypto policy from a spec string
pdf_crypto_policy() # current crypto policy
pdf_crypto_inventory() # cryptographic algorithm inventory
pdf_crypto_cbom() # Cryptographic Bill of Materials (CBOM)
日志
pdf_set_log_level(level) # set the library log level
pdf_get_log_level() # get the current log level
完整示例
library(pdfoxide)
# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)
# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")
for (i in seq_len(pdf_page_count(doc)) - 1L) { # 0-based indices
txt <- pdf_extract_text(doc, i)
cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}
chars <- pdf_extract_chars(doc, 0) # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)
# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)
# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)
Other Language Bindings
PDF Oxide 为所有主流生态系统提供原生绑定:Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C, Elixir。
后续步骤
- 类型与枚举 — 所有共享类型与枚举
- Page API 参考 — 各绑定间一致的逐页迭代方式
- R 快速上手 — 教程