What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

Java API 레퍼런스

PDF Oxide는 Rust 코어 위의 JNI 레이어를 통해 네이티브 Java 바인딩을 제공합니다. 내장된 네이티브 라이브러리는 클래스 로딩 시점에 자동으로 로드되며, Linux, macOS, Windows(x86_64 및 ARM64)용 사전 빌드된 네이티브 바이너리가 함께 제공됩니다.

<dependency>
  <groupId>fyi.oxide</groupId>
  <artifactId>pdf-oxide</artifactId>
  <version>0.3.69</version>
</dependency>

모든 클래스는 fyi.oxide.pdf 패키지와 그 하위 패키지(fyi.oxide.pdf.geometry, fyi.oxide.pdf.text, fyi.oxide.pdf.form 등)에 속합니다.

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;

생명 주기. PdfDocument, DocumentEditor, Pdf는 네이티브 메모리를 소유하며 AutoCloseable을 구현합니다. 항상 try-with-resources를 사용하세요. close()는 멱등적이며, 누수된 핸들을 해제하는 Cleaner 백스톱이 있지만 적시 정리를 위해 이에 의존해서는 안 됩니다.

스레드 안전성. 문서 인스턴스는 스레드에 안전하지 않으므로 워커마다 하나씩 열어야 합니다. 상태가 없는 정적 헬퍼(MarkdownConverter, PdfValidator, PdfPolicy)는 스레드에 안전합니다.

Rust API는 Rust API 레퍼런스를, Python API는 Python API 레퍼런스를 참고하세요.

PdfDocument

PDF에 대한 기본 읽기 전용 진입점으로, 열기, 추출, 렌더링, 변환을 담당합니다. AutoCloseable을 구현합니다.

import fyi.oxide.pdf.PdfDocument;
import java.nio.file.Paths;

try (PdfDocument doc = PdfDocument.open(Paths.get("invoice.pdf"))) {
    System.out.println(doc.extractText(0));
}

열기 (정적 팩토리 메서드)

static PdfDocument open(Path path)

파일 시스템 경로에서 PDF를 엽니다.

static PdfDocument open(String path)

경로 문자열에서 PDF를 엽니다.

static PdfDocument open(byte[] bytes)

메모리상의 바이트(예: S3나 HTTP에서 내려받은 데이터)에서 PDF를 엽니다.

static PdfDocument open(Path path, String password)

사용자 또는 소유자 비밀번호를 사용해 경로에서 암호화된 PDF를 엽니다.

static PdfDocument open(String path, String password)

비밀번호를 사용해 경로 문자열에서 암호화된 PDF를 엽니다.

static PdfDocument open(byte[] bytes, String password)

비밀번호를 사용해 바이트에서 암호화된 PDF를 엽니다.

static PdfDocument open(InputStream stream)

InputStream에서 모든 바이트를 읽어 PDF를 엽니다.

일회성 정적 헬퍼

static String extractText(String path)

한 번의 호출로 열기, 전체 텍스트 추출, 닫기를 수행합니다(경로 문자열).

static String extractText(Path path)

한 번의 호출로 열기, 전체 텍스트 추출, 닫기를 수행합니다(Path).

인증

boolean authenticate(String password)

암호화된 PDF를 연 뒤 인증합니다. 성공하면 true를 반환합니다.

boolean authenticate(byte[] password)

원시 바이트 비밀번호로 인증합니다.

문서 정보

int pageCount()

문서의 페이지 수를 반환합니다.

boolean isOpen()

문서 핸들이 아직 열려 있으면 true를 반환합니다.

텍스트 추출

String extractText(int pageIndex)

0부터 시작하는 단일 페이지에서 일반 텍스트를 추출합니다.

String extractTextAuto(int pageIndex)

페이지에서 텍스트를 추출하며, 스캔된 페이지의 경우 자동으로 OCR로 폴백합니다.

String extractStructured(int page)

구조화된 페이지 콘텐츠(스팬, 라인, 레이아웃)를 JSON 문자열로 추출합니다.

변환

String toMarkdown()

문서 전체를 Markdown으로 변환합니다.

String toMarkdown(int pageIndex)

단일 페이지를 Markdown으로 변환합니다.

String toHtml()

문서 전체를 HTML로 변환합니다.

String toHtml(int pageIndex)

단일 페이지를 HTML로 변환합니다.

DOM 접근

PdfPage page(int index)

주어진 0부터 시작하는 인덱스에 대해 지연(lazy) PdfPage 핸들을 반환합니다.

렌더링

byte[] render(int pageIndex)

기본 DPI로 페이지를 PNG 바이트로 렌더링합니다.

byte[] render(int pageIndex, int dpi)

지정한 DPI로 페이지를 PNG 바이트로 렌더링합니다.

생명 주기

void close()

내부 네이티브 핸들을 해제합니다. 멱등적입니다.

DocumentEditor

PDF에 대한 가변 편집 세션으로, 폼 채우기, 편집삭제(redaction), 메타데이터 제거, 저장을 수행합니다. 변경 메서드는 유연한 체이닝을 위해 this를 반환합니다. AutoCloseable을 구현합니다.

import fyi.oxide.pdf.DocumentEditor;

try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
    editor.setFormField("name", "Jane Doe")
          .setFormField("subscribe", true)
          .saveTo(Paths.get("filled.pdf"));
}

열기 (정적 팩토리 메서드)

static DocumentEditor open(Path path)

Path에서 편집용 PDF를 엽니다.

static DocumentEditor open(String path)

경로 문자열에서 편집용 PDF를 엽니다.

static DocumentEditor open(byte[] bytes)

메모리상의 바이트에서 편집용 PDF를 엽니다.

폼 필드

DocumentEditor setFormField(String name, String value)

이름으로 텍스트 또는 선택(choice) 폼 필드 값을 설정합니다. this를 반환합니다.

DocumentEditor setFormField(String name, boolean checked)

이름으로 체크박스/라디오 폼 필드를 설정합니다. this를 반환합니다.

편집삭제(Redaction)

DocumentEditor addRedaction(int pageIndex, BBox region)

페이지의 직사각형 영역에 대한 편집삭제를 대기열에 추가합니다. this를 반환합니다.

int redactionCount(int pageIndex)

한 페이지에 대기 중인 편집삭제 수를 반환합니다.

int redactionCount()

문서 전체에 대기 중인 편집삭제의 총 개수를 반환합니다.

RedactResult applyRedactionsDestructive()

대기 중인 모든 편집삭제를 적용하여 가려진 콘텐츠를 영구히 제거합니다. RedactResult를 반환합니다.

메타데이터

DocumentEditor scrubMetadata()

문서 정보와 XMP 메타데이터를 제거합니다. this를 반환합니다.

저장

byte[] save()

편집된 문서를 새 바이트 배열로 직렬화합니다(전체 재작성).

void saveTo(Path out)

편집된 문서를 파일에 기록합니다(전체 재작성).

byte[] saveIncremental()

원본 바이트를 보존하면서 증분 업데이트로 직렬화합니다.

void saveIncrementalTo(Path out)

증분 업데이트를 파일에 기록합니다.

생명 주기

boolean isOpen()
void close()

에디터가 열려 있는지 확인하고, 네이티브 핸들을 해제합니다.

Pdf

Markdown, HTML, 이미지로부터 새 PDF를 생성하고, 기존 PDF를 분할합니다. AutoCloseable을 구현합니다.

import fyi.oxide.pdf.Pdf;

try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
    pdf.saveTo(Paths.get("report.pdf"));
}

생성 (정적 팩토리 메서드)

static Pdf fromMarkdown(String markdown)

Markdown 콘텐츠에서 PDF를 생성합니다.

static Pdf fromHtml(String html)

HTML 콘텐츠에서 PDF를 생성합니다.

static Pdf fromImages(List<byte[]> images)

이미지(JPEG/PNG 바이트)당 한 페이지씩, 여러 페이지로 구성된 PDF를 생성합니다.

분할

List<BookmarkSegment> planSplitByBookmarks(SplitByBookmarksOptions opts)

출력 파일을 쓰지 않고, 특정 북마크 레벨에서 분할하기 위한 BookmarkSegment 계획을 계산합니다.

List<byte[]> splitByBookmarks(SplitByBookmarksOptions opts)

설정된 북마크 레벨에서 PDF를 분할하여 세그먼트마다 하나의 바이트 배열을 반환합니다.

static int planSplitByBookmarksCount(byte[] sourcePdf, int level)

Pdf를 열지 않고, 북마크 레벨 분할이 몇 개의 세그먼트를 생성하는지 반환합니다.

static byte[][] splitByBookmarksFromBytes(byte[] sourcePdf, int level)

소스 PDF 바이트를 주어진 북마크 레벨에서 한 번의 정적 호출로 분할합니다.

저장 및 생명 주기

byte[] save()

PDF를 바이트 배열로 직렬화합니다.

void saveTo(Path out)

PDF를 파일에 기록합니다.

boolean isOpen()
void close()

핸들이 열려 있는지 확인하고, 네이티브 리소스를 해제합니다.

AutoExtractor

각 페이지를 분류(텍스트 레이어 대 스캔본)하고 필요한 곳에 OCR을 적용하는 적응형 추출기입니다. 열려 있는 PdfDocument로부터 생성합니다.

import fyi.oxide.pdf.AutoExtractor;

try (PdfDocument doc = PdfDocument.open("scan.pdf")) {
    AutoExtractor extractor = AutoExtractor.balanced(doc);
    AutoResult result = extractor.extractDocument();
    System.out.println(result.text());
}

생성 (정적 팩토리 메서드)

static AutoExtractor of(PdfDocument doc)

기본 설정으로 추출기를 생성합니다.

static AutoExtractor of(PdfDocument doc, AutoExtractConfig config)

명시적인 AutoExtractConfig로 추출기를 생성합니다.

static AutoExtractor fast(PdfDocument doc)

속도에 맞춰 튜닝된 추출기(텍스트 레이어 우선)를 생성합니다.

static AutoExtractor balanced(PdfDocument doc)

속도와 정확도의 균형 프리셋으로 추출기를 생성합니다.

static AutoExtractor highFidelity(PdfDocument doc)

최대 정확도(공격적 OCR)에 맞춰 튜닝된 추출기를 생성합니다.

추출

String extractText()

문서 전체에서 일반 텍스트를 추출합니다.

String extractTextForPage(int pageIndex)

단일 페이지에서 일반 텍스트를 추출합니다.

AutoResult extractDocument()

문서 전체에 대해 전체 적응형 추출을 실행합니다. AutoResult를 반환합니다.

AutoResult extractAutoDocument()

완전한 문서 결과를 반환하는 extractDocument()의 별칭입니다.

AutoResult extractPage(int pageIndex)

단일 페이지에 대해 적응형 추출을 실행합니다.

AutoResult extractAutoPage(int pageIndex)

단일 페이지에 대한 extractPage()의 별칭입니다.

분류

ClassifyResult classifyDocument()

추출하지 않고 모든 페이지를 분류합니다. ClassifyResult를 반환합니다.

ClassifyResult classifyPage(int pageIndex)

단일 페이지를 분류합니다.

JSON 출력

String extractDocumentJson()

문서 전체를 추출하고 결과를 JSON으로 직렬화합니다.

String extractPageJson(int pageIndex)

페이지를 추출하고 결과를 JSON으로 직렬화합니다.

접근자

PdfDocument document()

내부 PdfDocument를 반환합니다.

AutoExtractConfig config()

활성 설정을 반환합니다.

MarkdownConverter

Markdown 및 HTML 변환을 위한 상태 없는 스레드 안전 정적 헬퍼입니다.

static String toMarkdown(PdfDocument doc, int pageIndex)

단일 페이지를 Markdown으로 변환합니다.

static String toMarkdown(PdfDocument doc)

문서 전체를 Markdown으로 변환합니다.

static String toHtml(PdfDocument doc, int pageIndex)

단일 페이지를 HTML로 변환합니다.

static String toHtml(PdfDocument doc)

문서 전체를 HTML로 변환합니다.

PdfSigner

PKCS#12 키스토어를 사용한 디지털 서명 및 검증.

import fyi.oxide.pdf.PdfSigner;
import fyi.oxide.pdf.signature.SignOptions;

PdfSigner signer = PdfSigner.fromPkcs12(Paths.get("cert.p12"), "keystore-pw");
byte[] signed = signer.sign(pdfBytes, SignOptions.builder().withReason("Approved").build());

static PdfSigner fromPkcs12(Path keystore, String password)

PKCS#12 키스토어 파일에서 서명자를 로드합니다.

static PdfSigner fromPkcs12(byte[] keystoreBytes, String password)

메모리상의 PKCS#12 키스토어 바이트에서 서명자를 로드합니다.

byte[] sign(byte[] pdf, SignOptions opts)

설정된 인증서와 SignOptions로 PDF 바이트에 서명합니다. 서명된 PDF를 반환합니다.

boolean verify(byte[] pdf)

PDF에 포함된 서명을 검증합니다. 유효하면 true를 반환합니다.

static SignatureLevel classifyLevel(byte[] pdf)

서명된 PDF의 PAdES 서명 레벨을 분류합니다. SignatureLevel을 반환합니다.

PdfValidator

상태 없는 스레드 안전 PDF/A, PDF/X, PDF/UA 규정 준수 검증.

static boolean isPdfA(PdfDocument doc, PdfALevel level)

주어진 레벨의 PDF/A 적합성을 빠르게 boolean으로 확인합니다.

static boolean isPdfUa(PdfDocument doc, PdfUaLevel level)

주어진 레벨의 PDF/UA 적합성을 빠르게 boolean으로 확인합니다.

static ValidationResult validatePdfA(PdfDocument doc, PdfALevel level)

PDF/A 레벨에 대해 검증합니다. 위반 사항을 담은 ValidationResult를 반환합니다.

static ValidationResult validatePdfX(PdfDocument doc, PdfXLevel level)

PDF/X 레벨에 대해 검증합니다.

static ValidationResult validatePdfUa(PdfDocument doc, PdfUaLevel level)

PDF/UA 레벨에 대해 검증합니다.

PdfPolicy

허용되는 암호 알고리즘을 제어하는 프로세스 전역 보안 정책입니다. 스레드 안전 정적 접근자를 제공합니다.

static PolicyMode current()

현재 활성화된 정책 모드를 반환합니다.

static void set(PolicyMode mode)

프로세스 전역 정책 모드를 설정합니다.

static PolicyMode compat()

허용적 호환성 모드 상수를 반환합니다.

static PolicyMode strict()

엄격(strict) 모드 상수를 반환합니다.

static PolicyMode fipsStrict()

FIPS 엄격 모드 상수를 반환합니다.

PdfPage

PdfDocument.page(int)가 반환하는 지연 페이지 핸들입니다. 프로퍼티는 접근 시점에 부모 문서로 디스패치됩니다.

PdfDocument parent()

소유 문서를 반환합니다.

int index()

0부터 시작하는 페이지 인덱스를 반환합니다.

BBox mediaBox()

페이지 MediaBox를 BBox로 반환합니다.

BBox cropBox()

페이지 CropBox를 반환합니다.

double width()
double height()

페이지의 너비와 높이를 PDF 포인트 단위로 반환합니다.

int rotation()

페이지 회전 각도(0, 90, 180, 270)를 반환합니다.

String text()

페이지의 모든 일반 텍스트를 추출합니다.

String text(BBox region)

직사각형 영역 내의 텍스트를 추출합니다.

List<TextWord> words()

경계 상자가 포함된 단어 단위 텍스트(TextWord)를 반환합니다.

List<TextLine> lines()

라인 단위 텍스트(TextLine)를 반환합니다.

List<TextChar> chars()

문자 단위 데이터(TextChar)를 반환합니다.

기하 타입

BBox

PDF 좌표계의 불변 축 정렬 경계 상자입니다.

BBox(double x0, double y0, double x1, double y1)
double x0()
double y0()
double x1()
double y1()
double width()
double height()

Rect

위치와 크기로 정의되는 직사각형(원점 + 너비/높이)입니다.

Rect(double x, double y, double width, double height)
double x()
double y()
double width()
double height()
BBox toBBox()

Point

2차원 점입니다.

Point(double x, double y)
double x()
double y()

Color

8비트 RGBA 색상입니다.

Color(int r, int g, int b, int a)
Color(int r, int g, int b)
int r()
int g()
int b()
int a()

텍스트 타입

TextChar

위치와 OCR 신뢰도를 가진 단일 디코딩 문자입니다.

TextChar(int codepoint, BBox bbox, float confidence)
int codepoint()
BBox bbox()
float confidence()
String asString()

TextWord

경계와 신뢰도를 가진 공백 구분 단어입니다.

TextWord(String text, BBox bbox, float confidence)
String text()
BBox bbox()
float confidence()

TextLine

단어들로 구성된 텍스트 한 줄입니다.

TextLine(String text, BBox bbox, List<TextWord> words)
String text()
BBox bbox()
List<TextWord> words()

TextSpan

동일한 스타일이 적용된 연속 텍스트입니다.

TextSpan(String text, BBox bbox, TextStyle style)
String text()
BBox bbox()
TextStyle style()

TextStyle

스팬에 대한 폰트 및 스타일 메타데이터입니다.

TextStyle(String font, double size, Color color, boolean bold, boolean italic)
double size()
Color color()
boolean bold()
boolean italic()

표 타입

Table

셀 그리드를 가진, 감지된 표입니다.

Table(BBox bbox, int rows, int cols, List<TableCell> cells)
BBox bbox()
int rows()
int cols()
List<TableCell> cells()

TableCell

병합(span) 정보를 포함한 단일 셀입니다.

TableCell(String text, BBox bbox, int row, int col, int rowSpan, int colSpan)
String text()
BBox bbox()
int row()
int col()
int rowSpan()
int colSpan()

이미지 타입

ExtractedImage

페이지에서 추출된 래스터 이미지입니다.

ExtractedImage(byte[] bytes, ImageFormat format, BBox bbox, int width, int height)
byte[] bytes()
ImageFormat format()
BBox bbox()
int width()
int height()

ImageFormat (enum)

JPEG, PNG, CCITT, RAW.

검색 타입

SearchOptions

빌더로 설정되는 검색 매개변수입니다.

boolean caseSensitive()
boolean wholeWord()
boolean regex()
Optional<Integer> maxResults()
static SearchOptions.Builder builder()

Builder: withCaseSensitive(boolean), withWholeWord(boolean), withRegex(boolean), withMaxResults(Integer), withMaxResults(int), build().

SearchResult

쿼리의 전체 결과입니다.

SearchResult(String query, List<SearchMatch> matches)
String query()
List<SearchMatch> matches()
int count()
boolean isEmpty()

SearchMatch

페이지와 위치를 가진 단일 일치 항목입니다.

SearchMatch(int pageIndex, BBox bbox, String text)
int pageIndex()
BBox bbox()
String text()

폼 타입

FormField

AcroForm 필드입니다.

FormField(String name, FormFieldType type, String value, BBox bbox, int pageIndex)
String name()
FormFieldType type()
Optional<String> value()
Optional<BBox> bbox()
int pageIndex()

FormFieldType (enum)

TEXT, CHECKBOX, RADIO, CHOICE.

주석 타입

Annotation

페이지 주석입니다.

Annotation(AnnotationType type, int pageIndex, BBox bbox, String contents, String uri)
AnnotationType type()
int pageIndex()
BBox bbox()
Optional<String> contents()
Optional<String> uri()

AnnotationType (enum)

HIGHLIGHT, TEXT, LINK, STAMP, UNDERLINE, STRIKEOUT, SQUIGGLY, FREE_TEXT, LINE, SQUARE, CIRCLE, FILE_ATTACHMENT.

메타데이터 타입

DocumentInfo

표준 문서 정보 딕셔너리 필드입니다.

Optional<String> title()
Optional<String> author()
Optional<String> subject()
Optional<String> keywords()
Optional<String> creator()
Optional<String> producer()
Optional<String> creationDate()
Optional<String> modificationDate()

XmpMetadata

원시 XMP 메타데이터 패킷입니다.

XmpMetadata(String xml)
String xml()
boolean isEmpty()

자동 추출 타입

AutoExtractConfig

AutoExtractor를 위한 불변, 빌더 기반 설정입니다.

Optional<ExtractMode> mode()
Optional<List<Integer>> forceOcrPages()
Optional<Double> minOcrConfidence()
Optional<List<String>> ocrLanguages()
Optional<List<String>> passwords()
Optional<Double> topMarginFraction()
Optional<Double> bottomMarginFraction()
Optional<Boolean> allowSingleColumnTables()
Optional<Boolean> ocrInlineImages()
Optional<String> cancelToken()
static AutoExtractConfig.Builder builder()
AutoExtractConfig.Builder toBuilder()

Builder 메서드: withMode(ExtractMode), withForceOcrPages(List<Integer>), withMinOcrConfidence(Double), withOcrLanguages(List<String>), withOcrLanguages(String...), withPasswords(List<String>), withPasswords(String...), withTopMarginFraction(Double), withTopMarginFraction(double), withBottomMarginFraction(Double), withBottomMarginFraction(double), withAllowSingleColumnTables(Boolean), withAllowSingleColumnTables(boolean), withOcrInlineImages(Boolean), withOcrInlineImages(boolean), withCancelToken(String), build().

AutoResult

적응형 추출의 결과입니다.

String text()
Optional<String> markdown()
Optional<String> html()
ExtractReason reason()
double confidence()
boolean ocrUsed()
List<RegionResult> regions()
List<Integer> pagesNeedingOcr()

RegionResult

영역별 추출 결과입니다.

int pageIndex()
BBox bbox()
String text()
ExtractReason reason()
double confidence()
boolean ocrUsed()
Optional<Table> table()

ClassifyResult

페이지 분류의 결과입니다.

List<PageClass> pages()
List<Integer> pagesNeedingOcr()
List<Integer> pagesWithChart()
List<Integer> pagesEncrypted()

ExtractMode (enum)

TEXT_ONLY, AUTO.

PageClass (enum)

TEXT_LAYER, SCANNED, MIXED.

ExtractReason (enum)

OK, SCANNED_NO_TEXT_LAYER, GLYPH_MAPPING_MISSING, ENCRYPTED_NO_EXTRACT_PERMISSION, IMAGE_TABLE_NO_STRUCTURE, CHART_NOT_TRANSCRIBED, OCR_REQUESTED_BUT_UNAVAILABLE, OCR_LOW_CONFIDENCE, EMPTY.

서명 타입

SignOptions

PdfSigner를 위한 빌더 기반 서명 매개변수입니다.

SignatureLevel level()
Optional<String> reason()
Optional<String> location()
Optional<String> contactInfo()
Optional<String> tsaUrl()
static SignOptions.Builder builder()

Builder: withLevel(SignatureLevel), withReason(String), withLocation(String), withContactInfo(String), withTsaUrl(String), build().

SignatureLevel (enum)

PAdES 베이스라인 레벨: B_B(기본), B_T(신뢰할 수 있는 타임스탬프 포함).

분할 타입

SplitByBookmarksOptions

북마크 기반 분할을 위한 빌더 기반 옵션입니다.

int level()
Optional<String> filenamePrefix()
static SplitByBookmarksOptions.Builder builder()

Builder: withLevel(int), withFilenamePrefix(String), build().

BookmarkSegment

북마크 분할에서 계획된 출력 세그먼트입니다.

BookmarkSegment(String title, int firstPage, int lastPage, String filename)
String title()
int firstPage()
int lastPage()
String filename()

편집삭제 타입

RedactResult

DocumentEditor.applyRedactionsDestructive()의 결과입니다.

RedactResult(int regionsApplied, boolean oracleVerified)
int regionsApplied()
boolean oracleVerified()

규정 준수 타입

ValidationResult

PdfValidator 검사의 결과입니다.

ValidationResult(boolean valid, List<ValidationViolation> violations)
boolean valid()
List<ValidationViolation> violations()

ValidationViolation

단일 적합성 위반입니다.

ValidationViolation(String ruleId, String description, Integer pageIndex)
String ruleId()
String description()
Optional<Integer> pageIndex()

PdfALevel (enum)

A_1B, A_1A, A_2B, A_2A, A_2U, A_3B, A_3A, A_3U, A_4, A_4E.

PdfXLevel (enum)

X_1A_2001, X_1A_2003, X_3_2002, X_3_2003, X_4, X_4P, X_5G, X_5N, X_5PG, X_6, X_6P.

PdfUaLevel (enum)

UA_1, UA_2 — 각각 int code()를 노출합니다.

정책 타입

PolicyMode (enum)

COMPAT, STRICT.

SecurityPolicy

작업별 보안 정책을 위한 빌더 기반 타입입니다.

PolicyMode mode()
List<String> additionalAllow()
List<String> additionalDeny()
static SecurityPolicy.Builder builder()

Builder: withMode(PolicyMode), allow(String algId), deny(String algId), build().

렌더링 타입

PixelFormat (enum)

RGBA_8888, RGB_888, GRAY_8.

오류 처리

모든 PDF 관련 실패는 PdfException(unchecked RuntimeException) 또는 그 하위 클래스를 던집니다. 모든 예외는 PdfErrorKind kind()를 가집니다.

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.exception.PdfException;

try (PdfDocument doc = PdfDocument.open("file.pdf")) {
    String text = doc.extractText(0);
} catch (PdfException e) {
    System.err.println(e.kind() + ": " + e.getMessage());
}

PdfException

PdfException(String message)
PdfException(PdfErrorKind kind, String message)
PdfException(PdfErrorKind kind, String message, Throwable cause)
PdfErrorKind kind()

하위 클래스

예외	발생 시점
`PdfParseException`	파일이 손상되었거나 유효한 PDF가 아닐 때
`PdfEncryptedException`	PDF가 암호화되어 있고 비밀번호가 없거나 잘못되었을 때
`PdfPermissionException`	요청한 작업이 문서 권한에 의해 거부될 때
`PdfIoException`	내부 I/O 오류가 발생했을 때
`PdfOcrUnavailableException`	OCR이 요청되었으나 OCR 백엔드를 사용할 수 없을 때
`PdfSignatureException`	서명 또는 검증 작업이 실패했을 때
`PdfInvalidStateException`	닫혔거나 유효하지 않은 핸들에 대해 작업이 호출되었을 때
`PdfUnsupportedException`	요청한 기능이 지원되지 않을 때

PdfErrorKind (enum)

PARSE, ENCRYPTED, PERMISSION, IO, OCR_UNAVAILABLE, SIGNATURE, INVALID_STATE, UNSUPPORTED.

완전한 예제

import fyi.oxide.pdf.PdfDocument;
import fyi.oxide.pdf.DocumentEditor;
import fyi.oxide.pdf.Pdf;
import fyi.oxide.pdf.AutoExtractor;
import fyi.oxide.pdf.auto.AutoResult;
import java.nio.file.Paths;

public class Example {
    public static void main(String[] args) throws Exception {
        // --- Extraction ---
        try (PdfDocument doc = PdfDocument.open(Paths.get("input.pdf"))) {
            System.out.println("Pages: " + doc.pageCount());
            for (int i = 0; i < doc.pageCount(); i++) {
                System.out.println(doc.extractText(i));
            }
            String markdown = doc.toMarkdown();

            // Adaptive extraction with OCR fallback
            AutoResult auto = AutoExtractor.balanced(doc).extractDocument();
            System.out.println("OCR used: " + auto.ocrUsed());
        }

        // --- Creation ---
        try (Pdf pdf = Pdf.fromMarkdown("# Report\n\nGenerated by PDF Oxide.")) {
            pdf.saveTo(Paths.get("report.pdf"));
        }

        // --- Editing ---
        try (DocumentEditor editor = DocumentEditor.open("form.pdf")) {
            editor.setFormField("name", "Jane Doe")
                  .setFormField("subscribe", true)
                  .scrubMetadata()
                  .saveTo(Paths.get("filled.pdf"));
        }
    }
}

Other Language Bindings

PDF Oxide는 모든 주요 생태계를 위한 네이티브 바인딩을 제공합니다: Rust, Python, Node.js, WASM, C#, Golang, PHP, Ruby, C++, Swift, Kotlin, Dart, R, Julia, Zig, Scala, Clojure, Objective-C, Elixir

다음 단계

타입 & 열거형 — 모든 공유 타입과 열거형
Page API 레퍼런스 — 바인딩 간 일관된 페이지 단위 순회
Java 시작하기 — 튜토리얼