What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

R API リファレンス

PDF Oxide は、慣用的な R バインディングを pdfoxide パッケージとして提供します。このパッケージは R ネイティブの .Call インターフェイスを通じて pdf_oxide の C ABI をラップしているため、Java や Python、外部ランタイムへの依存は一切ありません。必要なのはコンパイル済みの共有ライブラリだけです。

# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide

library(pdfoxide)

Rust API については Rust API リファレンスを参照してください。Python は Python API リファレンス、JavaScript は Node.js API リファレンスまたは WASM API リファレンスを参照してください。

R API は、R6/S4 クラスではなく不透明なハンドルオブジェクトを中心に構築された フラットな関数型 API です。主なハンドル型は次のとおりです。

ハンドル	生成元	用途
`pdfoxide_pdf`	`pdf_from_markdown()`、`pdf_from_html()`、…	生成されたばかりの PDF。保存または変換できる状態
`pdfoxide_document`	`pdf_open()`、`pdf_open_from_bytes()`	読み取り専用の抽出・レンダリング用に読み込んだ PDF
`pdfoxide_editor`	`pdf_editor_open()`	編集・結合・保存が可能なミュータブルな PDF
`pdfoxide_builder`	`pdf_builder_create()`	プログラムによるページ構築のための `DocumentBuilder`
`pdfoxide_page`（ビルダー）	`pdf_builder_page()`、`pdf_builder_a4_page()`、…	レイアウト中の流暢なページ
`pdfoxide_page`（遅延）	`pdf_page()`	ドキュメントの単一ページに対する遅延読み取りハンドル
`pdfoxide_renderer`、`pdfoxide_rendered_image`	レンダリング関数	再利用可能なレンダラーとラスター出力
`pdfoxide_certificate`、`pdfoxide_signature`、`pdfoxide_timestamp`、`pdfoxide_tsa_client`、`pdfoxide_dss`	署名／検証関数	電子署名のプリミティブ

すべてのページインデックスは 0 始まり です。ビルダー／ページ／エディターを変更する関数はハンドルを不可視で返すため、パイプ（|>）で呼び出しを連結できます。ハンドルはガベージコレクション時に自動的にクローズされますが、pdf_close() / *_close() を使えば即座に解放できます。

PDF を作成する

ソース形式からの手軽なワンショット生成です。いずれも pdfoxide_pdf ハンドルを返します。

pdf_from_markdown(markdown)                              # build a PDF from a Markdown string
pdf_from_html(html)                                      # build a PDF from an HTML string
pdf_from_text(text)                                      # build a PDF from plain text
pdf_from_image(path)                                     # build a single-page PDF from an image file
pdf_from_image_bytes(bytes)                              # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL)          # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes)  # HTML + CSS with multiple named font families
pdf_merge(paths)                                         # merge several PDF files into one new PDF

生成した PDF の保存／シリアライズ

pdf_save(pdf, path)            # write the PDF to a file path
pdf_to_bytes(pdf)             # serialize the PDF to a raw vector
pdf_get_page_count(pdf)       # number of pages in a built pdfoxide_pdf

ドキュメントを開く

既存の PDF を抽出・レンダリング用に開きます。pdfoxide_document を返します。

pdf_open(path)                          # open a PDF file from disk
pdf_open_with_password(path, password)  # open an encrypted PDF with a password
pdf_open_from_bytes(bytes)              # open a PDF from an in-memory raw vector
pdf_close(x)                            # close any pdfoxide handle and free it

Office 形式から開く

Word／PowerPoint／Excel のドキュメントを変換し、pdfoxide_document として直接開きます。

pdf_open_from_docx_bytes(bytes)   # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes)   # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes)   # convert XLSX bytes and open as a document

ドキュメントの検査

pdf_page_count(doc)            # number of pages
pdf_version(doc)               # PDF version as a list(major, minor)
pdf_is_encrypted(doc)          # TRUE if the document is encrypted
pdf_has_structure_tree(doc)    # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password)  # authenticate an encrypted document after opening
pdf_has_xfa(doc)               # TRUE if the document contains XFA forms
pdf_has_timestamp(doc)         # TRUE if the document carries a document timestamp

テキストとコンテンツの抽出

単一ページの抽出（ページインデックスは 0 始まり）。

pdf_extract_text(doc, page)              # reading-order plain text for one page
pdf_to_plain_text(doc, page)             # layout-aware plain text for one page
pdf_to_markdown(doc, page)               # Markdown for one page
pdf_to_html(doc, page)                   # HTML for one page
pdf_extract_structured_json(doc, page)   # structured layout JSON for one page

ドキュメント全体の抽出。

pdf_to_markdown_all(doc)      # Markdown for the entire document
pdf_to_html_all(doc)          # HTML for the entire document
pdf_to_plain_text_all(doc)    # plain text for the entire document
pdf_extract_all_text(doc)     # concatenated reading-order text for all pages

構造化／要素単位の抽出。これらはデータフレームまたはレコードのリストを返します。

pdf_extract_chars(doc, page)        # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page)        # word records with bounding boxes
pdf_extract_text_lines(doc, page)   # text-line records with bounding boxes
pdf_extract_tables(doc, page)       # detected tables with rows and cells
pdf_extract_paths(doc, page)        # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page)       # embedded font records used on a page
pdf_embedded_images(doc, page)      # embedded image records on a page
pdf_page_annotations(doc, page)     # annotation records on a page

自動判定による抽出（ネイティブ抽出と OCR 風ヒューリスティックを自動的に選択）。

pdf_extract_text_auto(doc, page)                  # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL)  # best-effort structured page extraction

領域（クリップ矩形）抽出

抽出範囲を PDF ポイント単位の矩形に限定します（原点は左下）。

pdf_extract_text_in_rect(doc, page, x, y, width, height)    # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height)   # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height)   # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height)  # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height)  # images inside a rectangle

遅延ページハンドル

pdf_page() は、単一ページにバインドされた軽量な pdfoxide_page を返します。テキストのゲッターはオンデマンドで抽出を行います。

pdf_page(doc, index)        # lazy handle for one page
pdf_page_text(page)         # plain text of the page
pdf_page_markdown(page)     # Markdown of the page
pdf_page_html(page)         # HTML of the page
pdf_page_plain_text(page)   # layout-aware plain text of the page

ページのジオメトリと生の要素

pdf_page_get_width(doc, page)      # page width in PDF points
pdf_page_get_height(doc, page)     # page height in PDF points
pdf_page_get_rotation(doc, page)   # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page)   # raw element records for the page

検索

pdf_search(doc, page, term, case_sensitive = FALSE)        # search one page
pdf_search_all(doc, term, case_sensitive = FALSE)          # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE)  # page search results as JSON

ページの分類とクリーンアップ

繰り返し現れるヘッダー、フッター、アーティファクトを検出して除去します。

pdf_classify_page(doc, page)              # classify the layout/content of one page
pdf_classify_document(doc)                # classify the whole document
pdf_remove_headers(doc, threshold = 0.5)  # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5)  # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5)  # detect and remove page artifacts
pdf_erase_header(doc, page)               # erase the header region on a page
pdf_erase_footer(doc, page)               # erase the footer region on a page
pdf_erase_artifacts(doc, page)            # erase artifact regions on a page

Office 変換（エクスポート）

読み込んだ PDF を Office 形式へ変換し直します。raw ベクトルを返します。

pdf_to_docx(doc)   # convert the document to DOCX bytes
pdf_to_pptx(doc)   # convert the document to PPTX bytes
pdf_to_xlsx(doc)   # convert the document to XLSX bytes

フォーム

pdf_get_form_fields(doc)                          # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L)  # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path)              # import form data from a file path
pdf_form_import_from_file(doc, filename)          # import form data from a named file

エディター側のフォーム補助関数は PDF の編集に記載しています。

ドキュメント構造とメタデータ

pdf_get_outline(doc)        # document outline / bookmarks tree
pdf_get_page_labels(doc)    # page-label ranges
pdf_get_xmp_metadata(doc)   # XMP metadata as a list
pdf_get_source_bytes(doc)   # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL)  # plan a split of the document by top-level bookmarks

注釈の詳細

個々の注釈をページとインデックスで検査します。

pdf_annotation_get_color(doc, page, index)              # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index)      # creation date string
pdf_annotation_get_modification_date(doc, page, index)  # modification date string
pdf_annotation_is_hidden(doc, page, index)              # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index)      # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index)           # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index)           # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index)           # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index)     # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index)        # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index)   # one highlight quad point
pdf_annotations_to_json(doc, page)                      # all annotations on a page as JSON

フォント・要素の JSON 補助関数

pdf_font_get_size(doc, page, index)   # size of a font record on a page
pdf_fonts_to_json(doc, page)          # page fonts as JSON
pdf_elements_to_json(doc, page)       # page elements as JSON

レンダリング

ページをラスター画像にレンダリングします。format: 0 = PNG、1 = JPEG。座標と DPI は関数ごとにドキュメント化されています。

pdf_render_page(doc, page, format = 0L)                 # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L)      # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L)       # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L)              # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L)  # render a sub-region

RenderOptions の全機能（背景 RGBA、透過、注釈の切り替え、JPEG 品質、レイヤー除外）。

pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
                             bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                             transparent_background = FALSE,
                             render_annotations = TRUE, jpeg_quality = 85L)

pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
                                bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                                transparent_background = FALSE,
                                render_annotations = TRUE, jpeg_quality = 85L,
                                excluded_layers = NULL)

再利用可能なレンダラーと処理時間の見積もり。

pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE)  # build a reusable renderer
pdf_renderer_close(renderer)            # free a renderer
pdf_estimate_render_time(doc, page)     # estimate render time for a page

レンダリング済み画像ハンドルの補助関数。

pdf_rendered_image_save(image, path)    # write a rendered image to a file
pdf_rendered_image_close(image)         # free a rendered image

PDF の編集

PDF を変更用に開きます。pdfoxide_editor を返します。

pdf_editor_open(path)               # open a PDF for editing
pdf_editor_open_from_bytes(bytes)   # open an editor from a raw vector
pdf_editor_close(editor)            # close the editor and free it

エディターの検査とメタデータ

pdf_editor_page_count(editor)               # page count
pdf_editor_version(editor)                  # PDF version as list(major, minor)
pdf_editor_is_modified(editor)              # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor)              # original source path, if any
pdf_editor_get_producer(editor)             # Producer metadata string
pdf_editor_set_producer(editor, value)      # set the Producer metadata
pdf_editor_get_creation_date(editor)        # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate

ページ操作

pdf_editor_delete_page(editor, page)               # delete a page
pdf_editor_move_page(editor, from, to)             # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees)   # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees)       # rotate every page
pdf_editor_get_page_rotation(editor, page)         # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees)  # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom)  # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page)         # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h)  # set CropBox
pdf_editor_get_page_media_box(editor, page)        # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox

墨消し（エディター）

pdf_editor_apply_all_redactions(editor)                  # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page)           # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page)    # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page)       # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h)        # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects)            # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page)             # clear pending erase regions

独立した墨消しワークフロー

pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0)  # add a redaction box with a fill color
pdf_redaction_count(editor, page)                                    # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0)  # burn in all redactions
pdf_redaction_scrub_metadata(editor)                                 # scrub metadata for redaction hygiene

フォームと注釈（エディター）

pdf_editor_flatten_forms(editor)                       # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page)         # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value)   # set a form-field value by name
pdf_editor_flatten_annotations(editor, page)           # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor)             # flatten all annotations
pdf_editor_flatten_warnings_count(editor)              # number of flatten warnings
pdf_editor_flatten_warning(editor, index)              # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page)    # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page)       # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes)             # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes)            # import XFDF form data

ドキュメント操作（エディター）

pdf_editor_merge_from(editor, source_path)             # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes)             # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level)             # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes)             # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages)       # extract selected pages to a new PDF (bytes)

保存（エディター）

pdf_editor_save(editor, path)                          # save to a file
pdf_editor_save_to_bytes(editor)                       # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
                                      garbage_collect = TRUE, linearize = FALSE)  # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password)            # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password)         # save AES-encrypted to bytes

DocumentBuilder（プログラムによる生成）

PDF をページ単位で構築します。pdf_builder_create() は pdfoxide_builder を返し、ページコンストラクターは流暢な pdfoxide_page を返します。

pdf_builder_create()         # start a new DocumentBuilder
pdf_builder_close(builder)   # free a builder

ビルダーのドキュメントメタデータ

pdf_builder_set_title(builder, value)      # set document title
pdf_builder_set_author(builder, value)     # set document author
pdf_builder_set_subject(builder, value)    # set document subject
pdf_builder_set_keywords(builder, value)   # set document keywords
pdf_builder_set_creator(builder, value)    # set document creator
pdf_builder_on_open(builder, script)       # set a document-open JavaScript action
pdf_builder_language(builder, lang)        # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder)        # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard)        # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font)  # register an embedded font for use on pages

ビルダーのページと出力

pdf_builder_page(builder, width, height)   # start a custom-size page
pdf_builder_a4_page(builder)               # start an A4 page
pdf_builder_letter_page(builder)           # start a US Letter page
pdf_builder_build(builder)                 # finish and return the PDF as bytes
pdf_builder_save(builder, path)            # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password)     # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password)       # finish and return encrypted bytes

埋め込みフォント

pdf_embedded_font_from_file(path)                 # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL)  # load an embedded font from bytes
pdf_embedded_font_close(font)                     # free an embedded font handle

ページビルダー（流暢なレイアウト）

以下はすべて、pdf_builder_page() が返す pdfoxide_page に対して動作し、連結のためにページを不可視で返します。ページは pdf_page_done() で完了します。

テキストフローとタイポグラフィ

pdf_page_font(page, name, size)        # set the active font and size
pdf_page_at(page, x, y)                # move the text cursor to a coordinate
pdf_page_builder_text(page, text)      # draw text at the cursor
pdf_page_heading(page, level, text)    # add a heading (level 1-6)
pdf_page_paragraph(page, text)         # add a wrapped paragraph
pdf_page_space(page, points)           # add vertical space
pdf_page_horizontal_rule(page)         # draw a horizontal rule
pdf_page_newline(page)                 # advance to the next line
pdf_page_footnote(page, ref_mark, note_text)        # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text)  # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L)  # flow text inside a rectangle
pdf_page_new_page_same_size(page)      # start a new page of the same size
pdf_page_done(page)                    # finish the page and return to the builder
pdf_page_close(page)                   # free a page handle

インラインのスタイル付きラン

pdf_page_inline(page, text)               # append an inline text run
pdf_page_inline_bold(page, text)          # append a bold inline run
pdf_page_inline_italic(page, text)        # append an italic inline run
pdf_page_inline_color(page, r, g, b, text)  # append a colored inline run

リンクと JavaScript アクション

pdf_page_link_url(page, url)              # add a URL link
pdf_page_link_page(page, index)           # add an internal page link
pdf_page_link_named(page, destination)    # add a named-destination link
pdf_page_link_javascript(page, script)    # add a JavaScript-action link
pdf_page_on_open(page, script)            # page-open JavaScript action
pdf_page_on_close(page, script)           # page-close JavaScript action
pdf_page_field_keystroke(page, script)    # field keystroke JavaScript action
pdf_page_field_format(page, script)       # field format JavaScript action
pdf_page_field_validate(page, script)     # field validate JavaScript action
pdf_page_field_calculate(page, script)    # field calculate JavaScript action

注釈とマークアップ

pdf_page_highlight(page, r, g, b)         # highlight markup at the current run
pdf_page_underline(page, r, g, b)         # underline markup
pdf_page_strikeout(page, r, g, b)         # strikeout markup
pdf_page_squiggly(page, r, g, b)          # squiggly underline markup
pdf_page_sticky_note(page, text)          # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text)            # add a text watermark
pdf_page_watermark_confidential(page)     # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page)            # add a DRAFT watermark
pdf_page_stamp(page, type_name)           # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation

AcroForm ウィジェット

pdf_page_text_field(page, name, x, y, w, h, default_value = NULL)        # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE)               # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL)     # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL)  # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption)                    # push button
pdf_page_signature_field(page, name, x, y, w, h)                         # signature field

バーコード（ページビルダー）

pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h)  # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size)                # draw a QR code

画像

pdf_page_image(page, bytes, x, y, w, h)                  # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text)  # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h)         # place an image tagged as an artifact

ベクターグラフィックス

pdf_page_rect(page, x, y, w, h)                          # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b)          # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2)                      # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b)   # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b)  # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0)    # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0)  # dashed line

テーブル

pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
               n_columns = length(widths), n_rows = NULL)  # render a static table

大規模／逐次的なデータ向けのストリーミングテーブル。

pdf_page_streaming_table_begin(page, headers, widths, aligns,
                               repeat_header = FALSE, n_columns = length(headers))  # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
                                  repeat_header = FALSE, mode = 0L, sample_rows = 0L,
                                  min_col_width_pt = 0, max_col_width_pt = 0,
                                  max_rowspan = 0L, n_columns = length(headers))  # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size)      # set the flush batch size
pdf_page_streaming_table_pending_row_count(page)               # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page)                     # number of flushed batches
pdf_page_streaming_table_flush(page)                           # flush buffered rows
pdf_page_streaming_table_push_row(page, cells)                 # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL)  # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page)                          # finish and lay out the streaming table

電子署名

証明書

pdf_certificate_load_from_bytes(bytes, password = NULL)  # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem)         # load a certificate + key from PEM
pdf_certificate_subject(cert)    # certificate subject DN
pdf_certificate_issuer(cert)     # certificate issuer DN
pdf_certificate_serial(cert)     # certificate serial number
pdf_certificate_validity(cert)   # validity window
pdf_certificate_is_valid(cert)   # TRUE if currently within the validity window
pdf_certificate_close(cert)      # free a certificate handle

署名

pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL)            # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...)     # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...)  # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL)          # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url)                     # add a TSA timestamp to a signature in bytes

署名の検査と検証

pdf_signature_count(doc)                  # number of signatures
pdf_get_signature(doc, index)             # signature handle by index
pdf_signature_signer_name(sig)            # signer common name
pdf_signature_signing_reason(sig)         # signing reason
pdf_signature_signing_location(sig)       # signing location
pdf_signature_signing_time(sig)           # signing time
pdf_signature_certificate(sig)            # signer certificate handle
pdf_signature_pades_level(sig)            # PAdES level of the signature
pdf_signature_has_timestamp(sig)          # TRUE if the signature is timestamped
pdf_signature_timestamp(sig)              # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp)  # attach a timestamp to a signature
pdf_signature_verify(sig)                 # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf)   # verify with a detached message digest check
pdf_signature_close(sig)                  # free a signature handle
pdf_verify_all_signatures(doc)            # verify every signature in the document

タイムスタンプ

pdf_timestamp_parse(bytes)               # parse a timestamp token (TST)
pdf_timestamp_token(timestamp)           # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp)            # timestamp time
pdf_timestamp_serial(timestamp)          # timestamp serial number
pdf_timestamp_tsa_name(timestamp)        # TSA name
pdf_timestamp_policy_oid(timestamp)      # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp)  # hash algorithm used
pdf_timestamp_verify(timestamp)          # verify the timestamp token
pdf_timestamp_close(timestamp)           # free a timestamp handle

TSA クライアント

pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
                      hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE)  # create a TSA client
pdf_tsa_request_timestamp(client, data)              # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L)  # request a timestamp over a precomputed hash
pdf_tsa_client_close(client)                         # free a TSA client

Document Security Store（DSS）

pdf_get_dss(doc)              # get the document's DSS handle
pdf_dss_cert_count(dss)       # number of certificates in the DSS
pdf_dss_crl_count(dss)        # number of CRLs
pdf_dss_ocsp_count(dss)       # number of OCSP responses
pdf_dss_vri_count(dss)        # number of VRI entries
pdf_dss_get_cert(dss, index)  # one DSS certificate
pdf_dss_get_crl(dss, index)   # one DSS CRL
pdf_dss_get_ocsp(dss, index)  # one DSS OCSP response
pdf_dss_close(dss)            # free a DSS handle

コンプライアンス検証

PDF/A

pdf_validate_pdf_a(doc, level = 0L)   # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results)           # TRUE if compliant
pdf_a_errors(results)                 # list of validation errors
pdf_a_warning_count(results)          # number of warnings
pdf_a_results_close(results)          # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes

PDF/UA（アクセシビリティ）

pdf_validate_pdf_ua(doc, level = 0L)  # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results)         # TRUE if accessible
pdf_ua_errors(results)                # list of accessibility errors
pdf_ua_warnings(results)              # list of accessibility warnings
pdf_ua_stats(results)                 # accessibility statistics
pdf_ua_results_close(results)         # free the results handle

PDF/X（印刷）

pdf_validate_pdf_x(doc, level = 0L)   # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results)           # TRUE if compliant
pdf_x_errors(results)                 # list of validation errors
pdf_x_results_close(results)          # free the results handle

バーコード

独立したバーコードの生成とデコード。

pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L)  # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L)            # generate a barcode in a given format
pdf_barcode_get_data(barcode)             # decoded data string
pdf_barcode_get_format(barcode)           # barcode format
pdf_barcode_get_confidence(barcode)       # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L)  # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L)        # rendered SVG string
pdf_barcode_close(barcode)                # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...)  # stamp a barcode onto an editor page

OCR

基盤のビルドで ocr 機能が有効になっている必要があります。

pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path)  # build an OCR engine from model paths
pdf_ocr_engine_close(engine)             # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page)        # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL)  # OCR a page (uses the default engine when NULL)

OCR モデルとランタイム設定

pdf_model_manifest()                       # available OCR model manifest
pdf_prefetch_available()                   # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL)  # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit)          # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping

暗号プロバイダー／FIPS

pdf_crypto_active_provider()   # name of the active crypto provider
pdf_crypto_fips_available()    # TRUE if a FIPS provider is available
pdf_crypto_use_fips()          # switch to the FIPS provider
pdf_crypto_set_policy(spec)    # set the crypto policy from a spec string
pdf_crypto_policy()            # current crypto policy
pdf_crypto_inventory()         # cryptographic algorithm inventory
pdf_crypto_cbom()              # Cryptographic Bill of Materials (CBOM)

ロギング

pdf_set_log_level(level)   # set the library log level
pdf_get_log_level()        # get the current log level

完全な例

library(pdfoxide)

# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)

# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")

for (i in seq_len(pdf_page_count(doc)) - 1L) {     # 0-based indices
  txt <- pdf_extract_text(doc, i)
  cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}

chars <- pdf_extract_chars(doc, 0)                 # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)

# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)

# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)

他の言語のバインディング

PDF Oxide はあらゆる主要なエコシステム向けにネイティブバインディングを提供しています： Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C, Elixir。

次のステップ

型と列挙型 — すべての共有型と列挙型
Page API リファレンス — バインディング間で一貫したページ単位の反復処理
R 入門 — チュートリアル