What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

R API 레퍼런스

PDF Oxide는 pdfoxide 패키지로 관용적인 R 바인딩을 제공합니다. 이 패키지는 R의 네이티브 .Call 인터페이스를 통해 pdf_oxide C ABI를 감싸므로, Java나 Python 같은 외부 런타임 의존성 없이 컴파일된 공유 라이브러리만 있으면 됩니다.

# install from source (requires a C toolchain)
R CMD INSTALL pdfoxide

library(pdfoxide)

Rust API는 Rust API 레퍼런스를, Python은 Python API 레퍼런스를, JavaScript는 Node.js API 레퍼런스 또는 WASM API 레퍼런스를 참고하세요.

R API는 R6/S4 클래스가 아니라 불투명 핸들 객체를 중심으로 구성된 플랫 함수형 API입니다. 주요 핸들 타입은 다음과 같습니다.

핸들	생성 함수	용도
`pdfoxide_pdf`	`pdf_from_markdown()`, `pdf_from_html()`, …	새로 만든 PDF로, 저장하거나 변환할 준비가 된 상태
`pdfoxide_document`	`pdf_open()`, `pdf_open_from_bytes()`	읽기 전용 추출·렌더링용으로 불러온 PDF
`pdfoxide_editor`	`pdf_editor_open()`	편집·병합·저장이 가능한 변경 가능 PDF
`pdfoxide_builder`	`pdf_builder_create()`	프로그래밍 방식 페이지 구성을 위한 `DocumentBuilder`
`pdfoxide_page` (builder)	`pdf_builder_page()`, `pdf_builder_a4_page()`, …	플루언트 방식으로 레이아웃을 잡는 페이지
`pdfoxide_page` (lazy)	`pdf_page()`	단일 페이지에 대한 지연(lazy) 읽기 핸들
`pdfoxide_renderer`, `pdfoxide_rendered_image`	렌더링 함수	재사용 가능한 렌더러와 렌더링된 래스터 출력
`pdfoxide_certificate`, `pdfoxide_signature`, `pdfoxide_timestamp`, `pdfoxide_tsa_client`, `pdfoxide_dss`	서명/검증 함수	전자서명 관련 기본 요소

모든 페이지 인덱스는 0부터 시작합니다. 빌더/페이지/에디터를 변경하는 함수는 핸들을 보이지 않게(invisibly) 반환하므로 파이프(|>)로 호출을 연결할 수 있습니다. 핸들은 가비지 컬렉션 시 자동으로 닫히지만, pdf_close() / *_close()로 즉시 해제할 수도 있습니다.

PDF 생성

소스 포맷으로부터 즉시 PDF를 만드는 함수들입니다. 각각 pdfoxide_pdf 핸들을 반환합니다.

pdf_from_markdown(markdown)                              # build a PDF from a Markdown string
pdf_from_html(html)                                      # build a PDF from an HTML string
pdf_from_text(text)                                      # build a PDF from plain text
pdf_from_image(path)                                     # build a single-page PDF from an image file
pdf_from_image_bytes(bytes)                              # build a single-page PDF from raw image bytes
pdf_from_html_css(html, css, font_bytes = NULL)          # build a PDF from HTML + CSS (optional embedded font)
pdf_from_html_css_with_fonts(html, css, families, font_bytes)  # HTML + CSS with multiple named font families
pdf_merge(paths)                                         # merge several PDF files into one new PDF

생성한 PDF 저장/직렬화

pdf_save(pdf, path)            # write the PDF to a file path
pdf_to_bytes(pdf)             # serialize the PDF to a raw vector
pdf_get_page_count(pdf)       # number of pages in a built pdfoxide_pdf

문서 열기

추출과 렌더링을 위해 기존 PDF를 엽니다. pdfoxide_document를 반환합니다.

pdf_open(path)                          # open a PDF file from disk
pdf_open_with_password(path, password)  # open an encrypted PDF with a password
pdf_open_from_bytes(bytes)              # open a PDF from an in-memory raw vector
pdf_close(x)                            # close any pdfoxide handle and free it

Office 포맷에서 열기

Word/PowerPoint/Excel 문서를 변환하여 곧바로 pdfoxide_document로 엽니다.

pdf_open_from_docx_bytes(bytes)   # convert DOCX bytes and open as a document
pdf_open_from_pptx_bytes(bytes)   # convert PPTX bytes and open as a document
pdf_open_from_xlsx_bytes(bytes)   # convert XLSX bytes and open as a document

문서 검사

pdf_page_count(doc)            # number of pages
pdf_version(doc)               # PDF version as a list(major, minor)
pdf_is_encrypted(doc)          # TRUE if the document is encrypted
pdf_has_structure_tree(doc)    # TRUE if the document is a Tagged PDF
pdf_authenticate(doc, password)  # authenticate an encrypted document after opening
pdf_has_xfa(doc)               # TRUE if the document contains XFA forms
pdf_has_timestamp(doc)         # TRUE if the document carries a document timestamp

텍스트 및 콘텐츠 추출

단일 페이지 추출(페이지 인덱스는 0부터 시작).

pdf_extract_text(doc, page)              # reading-order plain text for one page
pdf_to_plain_text(doc, page)             # layout-aware plain text for one page
pdf_to_markdown(doc, page)               # Markdown for one page
pdf_to_html(doc, page)                   # HTML for one page
pdf_extract_structured_json(doc, page)   # structured layout JSON for one page

문서 전체 추출.

pdf_to_markdown_all(doc)      # Markdown for the entire document
pdf_to_html_all(doc)          # HTML for the entire document
pdf_to_plain_text_all(doc)    # plain text for the entire document
pdf_extract_all_text(doc)     # concatenated reading-order text for all pages

구조화/요소 단위 추출. 데이터 프레임 또는 레코드 리스트를 반환합니다.

pdf_extract_chars(doc, page)        # per-character records (glyph, bbox, font, size, color)
pdf_extract_words(doc, page)        # word records with bounding boxes
pdf_extract_text_lines(doc, page)   # text-line records with bounding boxes
pdf_extract_tables(doc, page)       # detected tables with rows and cells
pdf_extract_paths(doc, page)        # vector path (line/curve/shape) records
pdf_embedded_fonts(doc, page)       # embedded font records used on a page
pdf_embedded_images(doc, page)      # embedded image records on a page
pdf_page_annotations(doc, page)     # annotation records on a page

자동 감지 추출(네이티브 방식과 OCR 방식 휴리스틱 중 적절한 쪽을 선택).

pdf_extract_text_auto(doc, page)                  # best-effort text for one page
pdf_extract_page_auto(doc, page, options_json = NULL)  # best-effort structured page extraction

영역(클립 사각형) 추출

PDF 포인트 단위(원점은 왼쪽 아래)로 지정한 사각형 안으로 추출 범위를 제한합니다.

pdf_extract_text_in_rect(doc, page, x, y, width, height)    # text inside a rectangle
pdf_extract_words_in_rect(doc, page, x, y, width, height)   # words inside a rectangle
pdf_extract_lines_in_rect(doc, page, x, y, width, height)   # lines inside a rectangle
pdf_extract_tables_in_rect(doc, page, x, y, width, height)  # tables inside a rectangle
pdf_extract_images_in_rect(doc, page, x, y, width, height)  # images inside a rectangle

지연(lazy) 페이지 핸들

pdf_page()는 단일 페이지에 바인딩된 가벼운 pdfoxide_page를 반환하며, 텍스트 getter는 호출 시점에 추출을 수행합니다.

pdf_page(doc, index)        # lazy handle for one page
pdf_page_text(page)         # plain text of the page
pdf_page_markdown(page)     # Markdown of the page
pdf_page_html(page)         # HTML of the page
pdf_page_plain_text(page)   # layout-aware plain text of the page

페이지 기하 정보 및 원시 요소

pdf_page_get_width(doc, page)      # page width in PDF points
pdf_page_get_height(doc, page)     # page height in PDF points
pdf_page_get_rotation(doc, page)   # page rotation in degrees (0/90/180/270)
pdf_page_get_elements(doc, page)   # raw element records for the page

검색

pdf_search(doc, page, term, case_sensitive = FALSE)        # search one page
pdf_search_all(doc, term, case_sensitive = FALSE)          # search the whole document
pdf_search_results_to_json(doc, page, term, case_sensitive = FALSE)  # page search results as JSON

페이지 분류 및 정리

반복되는 머리글, 바닥글, 아티팩트를 감지하여 제거합니다.

pdf_classify_page(doc, page)              # classify the layout/content of one page
pdf_classify_document(doc)                # classify the whole document
pdf_remove_headers(doc, threshold = 0.5)  # detect and remove repeating headers
pdf_remove_footers(doc, threshold = 0.5)  # detect and remove repeating footers
pdf_remove_artifacts(doc, threshold = 0.5)  # detect and remove page artifacts
pdf_erase_header(doc, page)               # erase the header region on a page
pdf_erase_footer(doc, page)               # erase the footer region on a page
pdf_erase_artifacts(doc, page)            # erase artifact regions on a page

Office 변환(내보내기)

불러온 PDF를 다시 Office 포맷으로 변환합니다. raw 벡터를 반환합니다.

pdf_to_docx(doc)   # convert the document to DOCX bytes
pdf_to_pptx(doc)   # convert the document to PPTX bytes
pdf_to_xlsx(doc)   # convert the document to XLSX bytes

폼

pdf_get_form_fields(doc)                          # list of form-field records
pdf_export_form_data_to_bytes(doc, format_type = 0L)  # export form data (0 = FDF, 1 = XFDF) to bytes
pdf_import_form_data(doc, data_path)              # import form data from a file path
pdf_form_import_from_file(doc, filename)          # import form data from a named file

에디터 측 폼 헬퍼는 PDF 편집 항목에 정리되어 있습니다.

문서 구조 및 메타데이터

pdf_get_outline(doc)        # document outline / bookmarks tree
pdf_get_page_labels(doc)    # page-label ranges
pdf_get_xmp_metadata(doc)   # XMP metadata as a list
pdf_get_source_bytes(doc)   # the original source bytes of the document
pdf_plan_split_by_bookmarks(doc, options_json = NULL)  # plan a split of the document by top-level bookmarks

주석(annotation) 상세 정보

페이지와 인덱스로 개별 주석을 조회합니다.

pdf_annotation_get_color(doc, page, index)              # annotation RGB color
pdf_annotation_get_creation_date(doc, page, index)      # creation date string
pdf_annotation_get_modification_date(doc, page, index)  # modification date string
pdf_annotation_is_hidden(doc, page, index)              # TRUE if the annotation is hidden
pdf_annotation_is_marked_deleted(doc, page, index)      # TRUE if marked deleted
pdf_annotation_is_printable(doc, page, index)           # TRUE if the annotation prints
pdf_annotation_is_read_only(doc, page, index)           # TRUE if read-only
pdf_link_annotation_get_uri(doc, page, index)           # URI of a link annotation
pdf_text_annotation_get_icon_name(doc, page, index)     # icon name of a text annotation
pdf_highlight_annotation_quad_points_count(doc, page, index)        # number of highlight quad points
pdf_highlight_annotation_quad_point(doc, page, index, quad_index)   # one highlight quad point
pdf_annotations_to_json(doc, page)                      # all annotations on a page as JSON

폰트 및 요소 JSON 헬퍼

pdf_font_get_size(doc, page, index)   # size of a font record on a page
pdf_fonts_to_json(doc, page)          # page fonts as JSON
pdf_elements_to_json(doc, page)       # page elements as JSON

렌더링

페이지를 래스터 이미지로 렌더링합니다. format: 0 = PNG, 1 = JPEG. 좌표와 DPI는 각 함수별로 문서화되어 있습니다.

pdf_render_page(doc, page, format = 0L)                 # render a page at default DPI
pdf_render_page_zoom(doc, page, zoom, format = 0L)      # render a page at a zoom factor
pdf_render_page_thumbnail(doc, page, size, format = 0L) # render a fitted thumbnail
pdf_render_page_fit(doc, page, w, h, format = 0L)       # render fitted into w x h pixels
pdf_render_page_raw(doc, page, dpi = 150L)              # render to a raw RGBA buffer
pdf_render_page_region(doc, page, crop_x, crop_y, crop_width, crop_height, format = 0L)  # render a sub-region

RenderOptions의 전체 옵션(배경 RGBA, 투명도, 주석 표시 여부, JPEG 품질, 레이어 제외).

pdf_render_page_with_options(doc, page, dpi = 150L, format = 0L,
                             bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                             transparent_background = FALSE,
                             render_annotations = TRUE, jpeg_quality = 85L)

pdf_render_page_with_options_ex(doc, page, dpi = 150L, format = 0L,
                                bg_r = 1, bg_g = 1, bg_b = 1, bg_a = 1,
                                transparent_background = FALSE,
                                render_annotations = TRUE, jpeg_quality = 85L,
                                excluded_layers = NULL)

재사용 가능한 렌더러와 렌더링 시간 추정.

pdf_create_renderer(dpi = 150L, format = 0L, quality = 85L, anti_alias = TRUE)  # build a reusable renderer
pdf_renderer_close(renderer)            # free a renderer
pdf_estimate_render_time(doc, page)     # estimate render time for a page

렌더링된 이미지 핸들 관련 헬퍼.

pdf_rendered_image_save(image, path)    # write a rendered image to a file
pdf_rendered_image_close(image)         # free a rendered image

PDF 편집

변경을 위해 PDF를 엽니다. pdfoxide_editor를 반환합니다.

pdf_editor_open(path)               # open a PDF for editing
pdf_editor_open_from_bytes(bytes)   # open an editor from a raw vector
pdf_editor_close(editor)            # close the editor and free it

에디터 검사 및 메타데이터

pdf_editor_page_count(editor)               # page count
pdf_editor_version(editor)                  # PDF version as list(major, minor)
pdf_editor_is_modified(editor)              # TRUE if the editor has unsaved changes
pdf_editor_source_path(editor)              # original source path, if any
pdf_editor_get_producer(editor)             # Producer metadata string
pdf_editor_set_producer(editor, value)      # set the Producer metadata
pdf_editor_get_creation_date(editor)        # CreationDate string
pdf_editor_set_creation_date(editor, value) # set the CreationDate

페이지 작업

pdf_editor_delete_page(editor, page)               # delete a page
pdf_editor_move_page(editor, from, to)             # move a page to a new index
pdf_editor_rotate_page_by(editor, page, degrees)   # rotate a page by a relative angle
pdf_editor_rotate_all_pages(editor, degrees)       # rotate every page
pdf_editor_get_page_rotation(editor, page)         # current page rotation
pdf_editor_set_page_rotation(editor, page, degrees)  # set absolute page rotation
pdf_editor_crop_margins(editor, left, right, top, bottom)  # crop margins on all pages
pdf_editor_get_page_crop_box(editor, page)         # get CropBox as c(x, y, w, h)
pdf_editor_set_page_crop_box(editor, page, x, y, w, h)  # set CropBox
pdf_editor_get_page_media_box(editor, page)        # get MediaBox as c(x, y, w, h)
pdf_editor_set_page_media_box(editor, page, x, y, w, h) # set MediaBox

마스킹(redaction, 에디터)

pdf_editor_apply_all_redactions(editor)                  # apply all pending redactions
pdf_editor_apply_page_redactions(editor, page)           # apply redactions on one page
pdf_editor_is_page_marked_for_redaction(editor, page)    # TRUE if page has pending redactions
pdf_editor_unmark_page_for_redaction(editor, page)       # clear pending redactions on a page
pdf_editor_erase_region(editor, page, x, y, w, h)        # erase a rectangle on a page
pdf_editor_erase_regions(editor, page, rects)            # erase several rectangles on a page
pdf_editor_clear_erase_regions(editor, page)             # clear pending erase regions

독립 실행형 마스킹(redaction) 워크플로

pdf_redaction_add(editor, page, x1, y1, x2, y2, r = 0, g = 0, b = 0)  # add a redaction box with a fill color
pdf_redaction_count(editor, page)                                    # pending redaction count on a page
pdf_redaction_apply(editor, scrub_metadata = FALSE, r = 0, g = 0, b = 0)  # burn in all redactions
pdf_redaction_scrub_metadata(editor)                                 # scrub metadata for redaction hygiene

폼 및 주석(에디터)

pdf_editor_flatten_forms(editor)                       # flatten all form fields into content
pdf_editor_flatten_forms_on_page(editor, page)         # flatten forms on one page
pdf_editor_set_form_field_value(editor, name, value)   # set a form-field value by name
pdf_editor_flatten_annotations(editor, page)           # flatten annotations on a page
pdf_editor_flatten_all_annotations(editor)             # flatten all annotations
pdf_editor_flatten_warnings_count(editor)              # number of flatten warnings
pdf_editor_flatten_warning(editor, index)              # one flatten warning message
pdf_editor_is_page_marked_for_flatten(editor, page)    # TRUE if page is marked for flatten
pdf_editor_unmark_page_for_flatten(editor, page)       # clear flatten mark on a page
pdf_editor_import_fdf_bytes(editor, bytes)             # import FDF form data
pdf_editor_import_xfdf_bytes(editor, bytes)            # import XFDF form data

문서 작업(에디터)

pdf_editor_merge_from(editor, source_path)             # append pages from another PDF file
pdf_editor_merge_from_bytes(editor, bytes)             # append pages from PDF bytes
pdf_editor_convert_to_pdf_a(editor, level)             # convert in place to PDF/A
pdf_editor_embed_file(editor, name, bytes)             # attach an embedded file
pdf_editor_extract_pages_to_bytes(editor, pages)       # extract selected pages to a new PDF (bytes)

저장(에디터)

pdf_editor_save(editor, path)                          # save to a file
pdf_editor_save_to_bytes(editor)                       # save to a raw vector
pdf_editor_save_to_bytes_with_options(editor, compress = TRUE,
                                      garbage_collect = TRUE, linearize = FALSE)  # save with options
pdf_editor_save_encrypted(editor, path, user_password, owner_password)            # save AES-encrypted to a file
pdf_editor_save_encrypted_to_bytes(editor, user_password, owner_password)         # save AES-encrypted to bytes

DocumentBuilder(프로그래밍 방식 생성)

PDF를 페이지 단위로 구성합니다. pdf_builder_create()는 pdfoxide_builder를 반환하고, 페이지 생성자는 플루언트 방식의 pdfoxide_page를 반환합니다.

pdf_builder_create()         # start a new DocumentBuilder
pdf_builder_close(builder)   # free a builder

빌더 문서 메타데이터

pdf_builder_set_title(builder, value)      # set document title
pdf_builder_set_author(builder, value)     # set document author
pdf_builder_set_subject(builder, value)    # set document subject
pdf_builder_set_keywords(builder, value)   # set document keywords
pdf_builder_set_creator(builder, value)    # set document creator
pdf_builder_on_open(builder, script)       # set a document-open JavaScript action
pdf_builder_language(builder, lang)        # set the document language (e.g. "en-US")
pdf_builder_tagged_pdf_ua1(builder)        # enable Tagged PDF / PDF-UA-1 output
pdf_builder_role_map(builder, custom, standard)        # map a custom structure tag to a standard role
pdf_builder_register_embedded_font(builder, name, font)  # register an embedded font for use on pages

빌더 페이지 및 출력

pdf_builder_page(builder, width, height)   # start a custom-size page
pdf_builder_a4_page(builder)               # start an A4 page
pdf_builder_letter_page(builder)           # start a US Letter page
pdf_builder_build(builder)                 # finish and return the PDF as bytes
pdf_builder_save(builder, path)            # finish and write to a file
pdf_builder_save_encrypted(builder, path, user_password, owner_password)     # finish and write AES-encrypted
pdf_builder_to_bytes_encrypted(builder, user_password, owner_password)       # finish and return encrypted bytes

임베디드 폰트

pdf_embedded_font_from_file(path)                 # load an embedded font from a TTF/OTF file
pdf_embedded_font_from_bytes(bytes, name = NULL)  # load an embedded font from bytes
pdf_embedded_font_close(font)                     # free an embedded font handle

페이지 빌더(플루언트 레이아웃)

아래 함수들은 모두 pdf_builder_page()가 반환한 pdfoxide_page에 대해 동작하며, 체이닝을 위해 페이지를 보이지 않게(invisibly) 반환합니다. 페이지 작업은 pdf_page_done()으로 마무리합니다.

텍스트 흐름 및 타이포그래피

pdf_page_font(page, name, size)        # set the active font and size
pdf_page_at(page, x, y)                # move the text cursor to a coordinate
pdf_page_builder_text(page, text)      # draw text at the cursor
pdf_page_heading(page, level, text)    # add a heading (level 1-6)
pdf_page_paragraph(page, text)         # add a wrapped paragraph
pdf_page_space(page, points)           # add vertical space
pdf_page_horizontal_rule(page)         # draw a horizontal rule
pdf_page_newline(page)                 # advance to the next line
pdf_page_footnote(page, ref_mark, note_text)        # add a footnote with a reference mark
pdf_page_columns(page, column_count, gap_pt, text)  # flow text into multiple columns
pdf_page_text_in_rect(page, x, y, w, h, text, align = 0L)  # flow text inside a rectangle
pdf_page_new_page_same_size(page)      # start a new page of the same size
pdf_page_done(page)                    # finish the page and return to the builder
pdf_page_close(page)                   # free a page handle

인라인 스타일 런(run)

pdf_page_inline(page, text)               # append an inline text run
pdf_page_inline_bold(page, text)          # append a bold inline run
pdf_page_inline_italic(page, text)        # append an italic inline run
pdf_page_inline_color(page, r, g, b, text)  # append a colored inline run

링크 및 JavaScript 액션

pdf_page_link_url(page, url)              # add a URL link
pdf_page_link_page(page, index)           # add an internal page link
pdf_page_link_named(page, destination)    # add a named-destination link
pdf_page_link_javascript(page, script)    # add a JavaScript-action link
pdf_page_on_open(page, script)            # page-open JavaScript action
pdf_page_on_close(page, script)           # page-close JavaScript action
pdf_page_field_keystroke(page, script)    # field keystroke JavaScript action
pdf_page_field_format(page, script)       # field format JavaScript action
pdf_page_field_validate(page, script)     # field validate JavaScript action
pdf_page_field_calculate(page, script)    # field calculate JavaScript action

주석 및 마크업

pdf_page_highlight(page, r, g, b)         # highlight markup at the current run
pdf_page_underline(page, r, g, b)         # underline markup
pdf_page_strikeout(page, r, g, b)         # strikeout markup
pdf_page_squiggly(page, r, g, b)          # squiggly underline markup
pdf_page_sticky_note(page, text)          # sticky note at the cursor
pdf_page_sticky_note_at(page, x, y, text) # sticky note at a coordinate
pdf_page_watermark(page, text)            # add a text watermark
pdf_page_watermark_confidential(page)     # add a CONFIDENTIAL watermark
pdf_page_watermark_draft(page)            # add a DRAFT watermark
pdf_page_stamp(page, type_name)           # add a rubber stamp (e.g. "Approved")
pdf_page_freetext(page, x, y, w, h, text) # add a free-text annotation

AcroForm 위젯

pdf_page_text_field(page, name, x, y, w, h, default_value = NULL)        # text field
pdf_page_checkbox(page, name, x, y, w, h, checked = FALSE)               # checkbox
pdf_page_combo_box(page, name, x, y, w, h, options, selected = NULL)     # combo box
pdf_page_radio_group(page, name, values, xs, ys, ws, hs, selected = NULL)  # radio-button group
pdf_page_push_button(page, name, x, y, w, h, caption)                    # push button
pdf_page_signature_field(page, name, x, y, w, h)                         # signature field

바코드(페이지 빌더)

pdf_page_barcode_1d(page, barcode_type, data, x, y, w, h)  # draw a 1D barcode
pdf_page_barcode_qr(page, data, x, y, size)                # draw a QR code

이미지

pdf_page_image(page, bytes, x, y, w, h)                  # place an image
pdf_page_image_with_alt(page, bytes, x, y, w, h, alt_text)  # place an image with alt text
pdf_page_image_artifact(page, bytes, x, y, w, h)         # place an image tagged as an artifact

벡터 그래픽

pdf_page_rect(page, x, y, w, h)                          # draw a rectangle outline
pdf_page_filled_rect(page, x, y, w, h, r, g, b)          # draw a filled rectangle
pdf_page_line(page, x1, y1, x2, y2)                      # draw a line
pdf_page_stroke_rect(page, x, y, w, h, width, r, g, b)   # stroke a rectangle with width and color
pdf_page_stroke_line(page, x1, y1, x2, y2, width, r, g, b)  # stroke a line with width and color
pdf_page_stroke_rect_dashed(page, x, y, w, h, width, r, g, b, dash = numeric(0), phase = 0)    # dashed rectangle
pdf_page_stroke_line_dashed(page, x1, y1, x2, y2, width, r, g, b, dash = numeric(0), phase = 0)  # dashed line

표

pdf_page_table(page, widths, aligns, cells, has_header = FALSE,
               n_columns = length(widths), n_rows = NULL)  # render a static table

대용량/점진적 데이터를 위한 스트리밍 표.

pdf_page_streaming_table_begin(page, headers, widths, aligns,
                               repeat_header = FALSE, n_columns = length(headers))  # begin a streaming table
pdf_page_streaming_table_begin_v2(page, headers, widths, aligns,
                                  repeat_header = FALSE, mode = 0L, sample_rows = 0L,
                                  min_col_width_pt = 0, max_col_width_pt = 0,
                                  max_rowspan = 0L, n_columns = length(headers))  # streaming table with autosize/rowspan
pdf_page_streaming_table_set_batch_size(page, batch_size)      # set the flush batch size
pdf_page_streaming_table_pending_row_count(page)               # rows buffered but not yet flushed
pdf_page_streaming_table_batch_count(page)                     # number of flushed batches
pdf_page_streaming_table_flush(page)                           # flush buffered rows
pdf_page_streaming_table_push_row(page, cells)                 # push one row of cells
pdf_page_streaming_table_push_row_v2(page, cells, rowspans = NULL)  # push a row with per-cell rowspans
pdf_page_streaming_table_finish(page)                          # finish and lay out the streaming table

전자서명

인증서

pdf_certificate_load_from_bytes(bytes, password = NULL)  # load a PKCS#12 / DER certificate from bytes
pdf_certificate_load_from_pem(cert_pem, key_pem)         # load a certificate + key from PEM
pdf_certificate_subject(cert)    # certificate subject DN
pdf_certificate_issuer(cert)     # certificate issuer DN
pdf_certificate_serial(cert)     # certificate serial number
pdf_certificate_validity(cert)   # validity window
pdf_certificate_is_valid(cert)   # TRUE if currently within the validity window
pdf_certificate_close(cert)      # free a certificate handle

서명

pdf_sign_bytes(pdf, cert, reason = NULL, location = NULL)            # sign PDF bytes (basic CMS signature)
pdf_sign_bytes_pades(pdf, cert, level = 0L, tsa_url = NULL, ...)     # sign PDF bytes with a PAdES profile
pdf_sign_bytes_pades_opts(pdf, cert, level = 0L, tsa_url = NULL, ...)  # PAdES signing with extended options
pdf_sign(doc, certificate, reason = NULL, location = NULL)          # sign a loaded document
pdf_add_timestamp(pdf_data, sig_index, tsa_url)                     # add a TSA timestamp to a signature in bytes

서명 검사 및 검증

pdf_signature_count(doc)                  # number of signatures
pdf_get_signature(doc, index)             # signature handle by index
pdf_signature_signer_name(sig)            # signer common name
pdf_signature_signing_reason(sig)         # signing reason
pdf_signature_signing_location(sig)       # signing location
pdf_signature_signing_time(sig)           # signing time
pdf_signature_certificate(sig)            # signer certificate handle
pdf_signature_pades_level(sig)            # PAdES level of the signature
pdf_signature_has_timestamp(sig)          # TRUE if the signature is timestamped
pdf_signature_timestamp(sig)              # embedded timestamp handle
pdf_signature_add_timestamp(sig, timestamp)  # attach a timestamp to a signature
pdf_signature_verify(sig)                 # verify the signature, returns a status
pdf_signature_verify_detached(sig, pdf)   # verify with a detached message digest check
pdf_signature_close(sig)                  # free a signature handle
pdf_verify_all_signatures(doc)            # verify every signature in the document

타임스탬프

pdf_timestamp_parse(bytes)               # parse a timestamp token (TST)
pdf_timestamp_token(timestamp)           # raw timestamp token bytes
pdf_timestamp_message_imprint(timestamp) # message imprint of the timestamp
pdf_timestamp_time(timestamp)            # timestamp time
pdf_timestamp_serial(timestamp)          # timestamp serial number
pdf_timestamp_tsa_name(timestamp)        # TSA name
pdf_timestamp_policy_oid(timestamp)      # timestamp policy OID
pdf_timestamp_hash_algorithm(timestamp)  # hash algorithm used
pdf_timestamp_verify(timestamp)          # verify the timestamp token
pdf_timestamp_close(timestamp)           # free a timestamp handle

TSA 클라이언트

pdf_tsa_client_create(url, username = NULL, password = NULL, timeout = 30L,
                      hash_algo = 0L, use_nonce = TRUE, cert_req = TRUE)  # create a TSA client
pdf_tsa_request_timestamp(client, data)              # request a timestamp over data
pdf_tsa_request_timestamp_hash(client, hash, hash_algo = 0L)  # request a timestamp over a precomputed hash
pdf_tsa_client_close(client)                         # free a TSA client

문서 보안 저장소(DSS)

pdf_get_dss(doc)              # get the document's DSS handle
pdf_dss_cert_count(dss)       # number of certificates in the DSS
pdf_dss_crl_count(dss)        # number of CRLs
pdf_dss_ocsp_count(dss)       # number of OCSP responses
pdf_dss_vri_count(dss)        # number of VRI entries
pdf_dss_get_cert(dss, index)  # one DSS certificate
pdf_dss_get_crl(dss, index)   # one DSS CRL
pdf_dss_get_ocsp(dss, index)  # one DSS OCSP response
pdf_dss_close(dss)            # free a DSS handle

표준 준수 검증

PDF/A

pdf_validate_pdf_a(doc, level = 0L)   # validate against a PDF/A level, returns a results handle
pdf_a_is_compliant(results)           # TRUE if compliant
pdf_a_errors(results)                 # list of validation errors
pdf_a_warning_count(results)          # number of warnings
pdf_a_results_close(results)          # free the results handle
pdf_convert_to_pdf_a(doc, level = 2L) # convert a document to PDF/A bytes

PDF/UA(접근성)

pdf_validate_pdf_ua(doc, level = 0L)  # validate against PDF/UA, returns a results handle
pdf_ua_is_accessible(results)         # TRUE if accessible
pdf_ua_errors(results)                # list of accessibility errors
pdf_ua_warnings(results)              # list of accessibility warnings
pdf_ua_stats(results)                 # accessibility statistics
pdf_ua_results_close(results)         # free the results handle

PDF/X(인쇄)

pdf_validate_pdf_x(doc, level = 0L)   # validate against PDF/X, returns a results handle
pdf_x_is_compliant(results)           # TRUE if compliant
pdf_x_errors(results)                 # list of validation errors
pdf_x_results_close(results)          # free the results handle

바코드

독립 실행형 바코드 생성 및 디코딩.

pdf_generate_qr_code(data, error_correction = 1L, size_px = 256L)  # generate a QR code, returns a barcode handle
pdf_generate_barcode(data, format = 0L, size_px = 256L)            # generate a barcode in a given format
pdf_barcode_get_data(barcode)             # decoded data string
pdf_barcode_get_format(barcode)           # barcode format
pdf_barcode_get_confidence(barcode)       # decode confidence
pdf_barcode_get_image_png(barcode, size_px = 256L)  # rendered PNG bytes
pdf_barcode_get_svg(barcode, size_px = 256L)        # rendered SVG string
pdf_barcode_close(barcode)                # free a barcode handle
pdf_editor_add_barcode_to_page(editor, page, barcode, x, y, width, ...)  # stamp a barcode onto an editor page

OCR

기반 빌드에 ocr 기능이 포함되어 있어야 합니다.

pdf_ocr_engine_create(det_model_path, rec_model_path, dict_path)  # build an OCR engine from model paths
pdf_ocr_engine_close(engine)             # free an OCR engine
pdf_ocr_page_needs_ocr(doc, page)        # TRUE if a page has no extractable text layer
pdf_ocr_extract_text(doc, page, engine = NULL)  # OCR a page (uses the default engine when NULL)

OCR 모델 및 런타임 설정

pdf_model_manifest()                       # available OCR model manifest
pdf_prefetch_available()                   # TRUE if model prefetching is available
pdf_prefetch_models(languages_csv = NULL)  # prefetch OCR models for given languages
pdf_set_max_ops_per_stream(limit)          # cap content-stream operations (DoS guard)
pdf_set_preserve_unmapped_glyphs(preserve) # keep glyphs with no Unicode mapping

암호화 프로바이더 / FIPS

pdf_crypto_active_provider()   # name of the active crypto provider
pdf_crypto_fips_available()    # TRUE if a FIPS provider is available
pdf_crypto_use_fips()          # switch to the FIPS provider
pdf_crypto_set_policy(spec)    # set the crypto policy from a spec string
pdf_crypto_policy()            # current crypto policy
pdf_crypto_inventory()         # cryptographic algorithm inventory
pdf_crypto_cbom()              # Cryptographic Bill of Materials (CBOM)

로깅

pdf_set_log_level(level)   # set the library log level
pdf_get_log_level()        # get the current log level

전체 예제

library(pdfoxide)

# --- Create ---
pdf <- pdf_from_markdown("# Report\n\nGenerated by **PDF Oxide**.\n")
pdf_save(pdf, "report.pdf")
pdf_close(pdf)

# --- Extract ---
doc <- pdf_open("report.pdf")
cat("Pages:", pdf_page_count(doc), "\n")

for (i in seq_len(pdf_page_count(doc)) - 1L) {     # 0-based indices
  txt <- pdf_extract_text(doc, i)
  cat(sprintf("Page %d: %d characters\n", i + 1L, nchar(txt)))
}

chars <- pdf_extract_chars(doc, 0)                 # per-character data frame
results <- pdf_search_all(doc, "PDF Oxide", case_sensitive = FALSE)
pdf_close(doc)

# --- Edit ---
ed <- pdf_editor_open("report.pdf")
pdf_editor_set_producer(ed, "PDF Oxide")
pdf_editor_rotate_all_pages(ed, 90L)
pdf_editor_save(ed, "rotated.pdf")
pdf_editor_close(ed)

# --- Build programmatically ---
b <- pdf_builder_create()
pdf_builder_set_title(b, "Invoice")
page <- pdf_builder_letter_page(b)
pdf_page_font(page, "Helvetica", 24)
pdf_page_at(page, 72, 720)
pdf_page_builder_text(page, "Invoice #1001")
pdf_page_done(page)
pdf_builder_save(b, "invoice.pdf")
pdf_builder_close(b)

Other Language Bindings

PDF Oxide는 모든 주요 생태계를 위한 네이티브 바인딩을 제공합니다: Rust, Python, Node.js, WASM, C#, Golang, Java, PHP, Ruby, C++, Swift, Kotlin, Dart, Julia, Zig, Scala, Clojure, Objective-C, Elixir

다음 단계

타입 & 열거형 — 모든 공유 타입과 열거형
Page API 레퍼런스 — 바인딩 간 일관된 페이지 단위 순회
R 시작하기 — 튜토리얼