Skip to content

类型与枚举

pdf_oxide 中的所有公共类型,按类别组织。 关于方法级文档,请参见 Rust API 参考.


几何

Point

pub struct Point {
    pub x: f64,
    pub y: f64,
}

PDF 坐标空间中的二维点(原点在左下角,Y 轴向上递增)。

Rect

pub struct Rect {
    pub x0: f64,
    pub y0: f64,
    pub x1: f64,
    pub y1: f64,
}

轴对齐矩形。 (x0, y0) 是左下角, (x1, y1) 是右上角。所有坐标以 PDF 磅为单位(1 磅 = 1/72 英寸)。

Python: 可作为元组访问 (x0, y0, x1, y1).

Matrix

pub struct Matrix {
    pub a: f64,
    pub b: f64,
    pub c: f64,
    pub d: f64,
    pub e: f64,
    pub f: f64,
}

PDF 规范格式的 3x3 仿射变换矩阵:

| a  b  0 |
| c  d  0 |
| e  f  1 |

用于文本定位、图片放置和 Form XObject 变换。


文本

TextSpan

pub struct TextSpan {
    pub text: String,
    pub x: f64,
    pub y: f64,
    pub font_name: String,
    pub font_size: f64,
    pub bbox: Rect,
}

一段相同样式的连续文本。由 extract_spans().

TextChar

pub struct TextChar {
    pub char: char,
    pub x: f64,
    pub y: f64,
    pub font_size: f64,
    pub font_name: String,
    pub bbox: Rect,
}

带有精确位置的单个字符。由 extract_chars().

Python Fields:

字段 类型 描述
char str 字符
bbox tuple[float, float, float, float] Bounding box (x0, y0, x1, y1)
font_name str 字体名称
font_size float 字体大小(磅)
origin_x float 基线原点 X
origin_y float 基线原点 Y
rotation_degrees float 旋转角度(0-360)
advance_width float 到下一个字符位置的距离

FontWeight

pub enum FontWeight {
    Thin,       // 100
    ExtraLight, // 200
    Light,      // 300
    Normal,     // 400
    Medium,     // 500
    SemiBold,   // 600
    Bold,       // 700
    ExtraBold,  // 800
    Black,      // 900
}

Color

pub struct Color {
    pub r: f64,
    pub g: f64,
    pub b: f64,
}

RGB 颜色,分量范围为 0.0 到 1.0。


页面

PageSize

pub enum PageSize {
    Letter,         // 612 x 792 pt (8.5 x 11 in)
    A4,             // 595.28 x 841.89 pt (210 x 297 mm)
    Legal,          // 612 x 1008 pt (8.5 x 14 in)
    A3,             // 841.89 x 1190.55 pt
    A5,             // 419.53 x 595.28 pt
    Custom(f32, f32), // Custom width x height in points
}

方法:

方法 返回值 描述
dimensions() (f32, f32) 宽度和高度(磅)

PageInfo

pub struct PageInfo {
    pub width: f64,
    pub height: f64,
    pub rotation: i32,
    pub media_box: Rect,
    pub crop_box: Option<Rect>,
    pub trim_box: Option<Rect>,
    pub bleed_box: Option<Rect>,
    pub art_box: Option<Rect>,
}

PageLabelStyle

pub enum PageLabelStyle {
    Decimal,        // 1, 2, 3, ...
    UpperRoman,     // I, II, III, ...
    LowerRoman,     // i, ii, iii, ...
    UpperAlpha,     // A, B, C, ...
    LowerAlpha,     // a, b, c, ...
    None,           // No numbering
}

PageLabelRange

pub struct PageLabelRange {
    pub start_page: usize,
    pub style: PageLabelStyle,
    pub prefix: Option<String>,
    pub start_number: Option<usize>,
}

定义页面标签范围。 例如,从第 0 页开始使用 LowerRoman 样式、起始数字为 1 的范围将页面标注为 i、ii、iii 等。


图片

ImageFormat

pub enum ImageFormat {
    Jpeg,
    Png,
    Tiff,
    Bmp,
    Gif,
    Jp2,
    Jbig2,
    Ccitt,
    Raw,
    Unknown,
}

ColorSpace

pub enum ColorSpace {
    DeviceRGB,
    DeviceCMYK,
    DeviceGray,
    ICCBased,
    CalRGB,
    CalGray,
    Lab,
    Indexed,
    Pattern,
    Separation,
    DeviceN,
    Unknown,
}

ImageInfo

pub struct ImageContent {
    pub width: u32,
    pub height: u32,
    pub bits_per_component: u8,
    pub color_space: ColorSpace,
    pub format: ImageFormat,
    pub data: Vec<u8>,
    pub bbox: Rect,
    pub horizontal_dpi: Option<f32>,
    pub vertical_dpi: Option<f32>,
}

方法:

方法 返回值 描述
resolution() Option<(f32, f32)> DPI,格式为(水平,垂直)
is_high_resolution() bool DPI >= 300
is_medium_resolution() bool DPI 150-299
is_low_resolution() bool DPI < 150
calculate_dpi() Option<(f32, f32)> 从像素尺寸和边界框计算

路径

PathContent

pub struct PathContent {
    pub operations: Vec<PathOperation>,
    pub stroke_color: Option<Color>,
    pub fill_color: Option<Color>,
    pub stroke_width: f32,
    pub line_cap: LineCap,
    pub line_join: LineJoin,
    pub bbox: Rect,
}

PathOperation

pub enum PathOperation {
    MoveTo { x: f64, y: f64 },
    LineTo { x: f64, y: f64 },
    CurveTo { x1: f64, y1: f64, x2: f64, y2: f64, x3: f64, y3: f64 },
    ClosePath,
    Rect { x: f64, y: f64, width: f64, height: f64 },
}

LineCap

pub enum LineCap {
    Butt,    // 端点处方形截断(默认)
    Round,   // 端点处半圆
    Square,  // 方形延伸半线宽超过端点
}

LineJoin

pub enum LineJoin {
    Miter,   // 尖角(默认)
    Round,   // 圆角
    Bevel,   // 平角
}

BlendMode

pub enum BlendMode {
    Normal,
    Multiply,
    Screen,
    Overlay,
    Darken,
    Lighten,
    ColorDodge,
    ColorBurn,
    HardLight,
    SoftLight,
    Difference,
    Exclusion,
}

用于图形状态的透明度合成。 通过 ExtGState 或图形 API 设置。


内容元素

ContentElement

所有页面内容元素的联合类型:

pub enum ContentElement {
    Text(TextContent),
    Image(ImageContent),
    Path(PathContent),
    Table(TableContent),
    结构(结构Element),
}

TextContent

pub struct TextContent {
    pub text: String,
    pub font_name: String,
    pub font_size: f64,
    pub font_weight: FontWeight,
    pub font_style: FontStyle,
    pub color: Color,
    pub bbox: Rect,
}

FontStyle

pub enum FontStyle {
    Normal,
    Italic,
    Oblique,
}

TextStyle

pub struct TextStyle {
    pub font_name: String,
    pub font_size: f64,
    pub font_weight: FontWeight,
    pub font_style: FontStyle,
    pub color: Color,
}

TableContent

pub struct TableContent {
    pub rows: Vec<TableRowContent>,
    pub column_count: usize,
    pub has_header: bool,
    pub caption: Option<String>,
    pub style: Option<TableContentStyle>,
    pub source: TableSource,
    pub bbox: Rect,
}

TableRowContent

pub struct TableRowContent {
    pub cells: Vec<TableCellContent>,
    pub is_header: bool,
}

TableCellContent

pub struct TableCellContent {
    pub text: String,
    pub row_span: usize,
    pub col_span: usize,
    pub alignment: TableCellAlign,
    pub vertical_alignment: TableCellVAlign,
    pub bbox: Rect,
}

TableCellAlign / TableCellVAlign

pub enum TableCellAlign {
    Left,
    Center,
    Right,
}

pub enum TableCellVAlign {
    Top,
    Middle,
    Bottom,
}

TableSource

pub enum TableSource {
    结构Tree,    // From tagged PDF structure
    Geometric,        // Detected from layout analysis
    Manual,           // User-created
}

结构Element

pub struct 结构Element {
    pub structure_type: String,
    pub children: Vec<ContentElement>,
    pub alt_text: Option<String>,
    pub actual_text: Option<String>,
    pub language: Option<String>,
}

注释

AnnotationType

pub enum AnnotationSubtype {
    Text,
    Link,
    FreeText,
    Line,
    Square,
    Circle,
    Polygon,
    PolyLine,
    Highlight,
    Underline,
    Squiggly,
    StrikeOut,
    Stamp,
    Caret,
    Ink,
    Popup,
    FileAttachment,
    Sound,
    Movie,
    Screen,
    Widget,
    PrinterMark,
    TrapNet,
    Watermark,
    ThreeD,
    Redact,
    RichMedia,
    Unknown,
}

TextAnnotationIcon

pub enum TextAnnotationIcon {
    Comment,
    Key,
    Note,
    Help,
    NewParagraph,
    Paragraph,
    Insert,
    Check,
    Circle,
    Cross,
    RightArrow,
    RightPointer,
    Star,
    UpArrow,
    UpLeftArrow,
}

StampType

pub enum StampType {
    Approved,
    Experimental,
    NotApproved,
    AsIs,
    Expired,
    NotForPublicRelease,
    Confidential,
    Final,
    Sold,
    Departmental,
    ForComment,
    TopSecret,
    Draft,
    ForPublicRelease,
}

条码

BarcodeType

需要 barcodes 功能标志。

pub enum BarcodeType {
    QrCode,
    Code128,
    Ean13,
    UpcA,
    Code39,
    Itf,
}

Python: 条码类型以字符串形式传入: "code128", "ean13", "upca", "code39", "ean8", "itf".


表单

FieldType / FormFieldType

pub enum FormFieldType {
    Text,
    Button,
    Choice,
    Signature,
}

FormField

pub struct FormField {
    pub name: String,
    pub field_type: FormFieldType,
    pub value: FormFieldValue,
    pub rect: Option<Rect>,
    pub page_index: Option<usize>,
    pub readonly: bool,
    pub required: bool,
}

FormFieldValue

pub enum FormFieldValue {
    None,
    Text(String),
    Boolean(bool),
    Choice(String),
    MultiChoice(Vec<String>),
}

方法:

方法 返回值 描述
is_none() bool 检查值是否为 None
as_text() Option<&str> 获取文本值
as_bool() Option<bool> 获取布尔值
as_choice() Option<&str> 获取选择值
as_multi_choice() Option<&[String]> 获取多选值

搜索

SearchOptions

pub struct SearchOptions {
    pub case_sensitive: bool,
    pub literal: bool,
    pub whole_word: bool,
    pub max_results: Option<usize>,
    pub page_range: Option<(usize, usize)>,
}

SearchResult

pub struct SearchResult {
    pub page_index: usize,
    pub text: String,
    pub bbox: Rect,
    pub context: Option<String>,
}

Pdf::search() and PdfDocument::search(). Each result 包含找到匹配的页码、匹配文本、边界框坐标和可选的上下文。


配置

ConversionOptions

pub struct ConversionOptions {
    pub preserve_layout: bool,
    pub detect_headings: bool,
    pub extract_tables: bool,
    pub include_images: bool,
    pub image_output_dir: Option<String>,
    pub embed_images: bool,
    // ... additional fields
}

TextConfig

pub struct TextConfig {
    pub detect_headings: bool,
    pub detect_lists: bool,
    pub detect_tables: bool,
    pub merge_spans: bool,
}

RenderOptions

需要 rendering 功能标志。

pub struct RenderOptions {
    pub dpi: f32,
    pub background_color: Option<Color>,
    pub format: ImageFormat,
}

加密Config

pub struct 加密Config {
    pub user_password: String,
    pub owner_password: String,
    pub algorithm: 加密Algorithm,
    pub permissions: Permissions,
}

构造函数:

加密Config::new("user_pass", "owner_pass")
    .with_algorithm(加密Algorithm::Aes256)
    .with_permissions(Permissions::read_only())

加密Algorithm

pub enum 加密Algorithm {
    Rc4_40,     // V=1, R=2, 40-bit (legacy)
    Rc4_128,    // V=2, R=3, 128-bit (legacy)
    Aes128,     // V=4, R=4, 128-bit
    Aes256,     // V=5, R=6, 256-bit (default, recommended)
}

Permissions

pub struct Permissions {
    pub print: bool,
    pub copy: bool,
    pub modify: bool,
    pub annotate: bool,
    pub fill_forms: bool,
    pub extract: bool,
}

工厂方法:

方法 描述
Permissions::all() 所有权限已启用
Permissions::read_only() 仅允许查看

PdfConfig

pub struct PdfConfig {
    pub page_size: PageSize,
    pub margins: (f32, f32, f32, f32),  // left, right, top, bottom
    pub font_size: f32,
    pub line_height: f32,
    pub title: Option<String>,
    pub author: Option<String>,
    pub subject: Option<String>,
    pub keywords: Option<String>,
}

文本布局

TextAlign

pub enum TextAlign {
    Left,
    Center,
    Right,
    Justify,
}

RectFilterMode

用于空间文本提取:

pub enum RectFilterMode {
    Intersects,             // 任意重叠(默认)
    FullyContained,         // 完全在边界内
    MinOverlap(f32),        // 最小重叠比例 (0.0-1.0)
}

合规类型

参见 PDF/A, PDF/UA, and PDF/X 页面了解完整的合规文档。

PdfA级别

pub enum PdfA级别 {
    A1a, A1b,
    A2a, A2b, A2u,
    A3a, A3b, A3u,
}

PdfUa级别

pub enum PdfUa级别 {
    UA1,
    UA2,
}

PdfX级别

pub enum PdfX级别 {
    X1a2001, X1a2003,
    X3_2002, X3_2003,
    X4, X4p,
    X5g, X5n, X5pg,
    X6, X6n, X6p,
}

错误类型

PdfError

pub enum PdfError {
    Io(std::io::Error),
    Parse(String),
    Password,
    PageOutOfRange { index: usize, count: usize },
}
变体 描述
Io 文件系统或 I/O 错误
Parse PDF 结构格式错误
Password 文档已加密但未提供密码
PageOutOfRange 请求的页面索引超过页数

下一步