Skip to content

PDF/A 验证

PDF/A(ISO 19005)是电子文档长期存档的国际标准。PDF Oxide 验证所有主要 PDF/A 级别,并可将不合规文档转换为合规状态。

支持的级别

级别 标准 结构 Unicode 透明度 嵌入文件
1a ISO 19005-1 必需 必需 No No
1b ISO 19005-1 No No No No
2a ISO 19005-2 必需 必需 Yes No
2b ISO 19005-2 No No Yes No
2u ISO 19005-2 No 必需 Yes No
3a ISO 19005-3 必需 必需 Yes Yes
3b ISO 19005-3 No No Yes Yes
3u ISO 19005-3 No 必需 Yes Yes

级别 “a”(无障碍)要求带标签的结构树和 Unicode 字符映射。级别 “b”(基础)仅要求视觉可重现性。级别 “u”(Unicode)要求 Unicode 文本映射但不要求完整的结构树。

快速验证

使用便捷函数进行一次性检查:

from pdf_oxide import PdfDocument

doc = PdfDocument("document.pdf")
result = doc.validate_pdf_a("2b")
print(f"Valid: {result.valid}")
print(f"Level: {result.level}")
for error in result.errors:
    print(f"  Error: {error}")
const doc = new WasmPdfDocument(bytes);
const result = doc.validatePdfA("2b");
console.log(`Valid: ${result.valid}`);
console.log(`Errors: ${result.errors.length}`);
doc.free();
use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{validate_pdf_a, PdfA级别};

let mut doc = PdfDocument::open("archive.pdf")?;
let result = validate_pdf_a(&mut doc, PdfA级别::A1b)?;

if result.has_errors() {
    println!("Not PDF/A-1b compliant:");
    for error in &result.errors {
        println!("  [{}] {} (clause {})",
            error.code, error.message,
            error.clause.as_deref().unwrap_or("n/a"));
    }
} else {
    println!("Document is PDF/A-1b compliant");
}

验证器 API

PdfAValidator 提供构建器模式来进行细粒度控制:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{PdfAValidator, PdfA级别};

let mut doc = PdfDocument::open("report.pdf")?;

let result = PdfAValidator::new()
    .stop_on_first_error(false)
    .include_warnings(true)
    .validate(&mut doc, PdfA级别::A2b)?;

println!("Errors: {}", result.errors.len());
println!("Warnings: {}", result.warnings.len());

定向检查

运行单独的验证类别而非完整套件:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{PdfAValidator, PdfA级别};

let mut doc = PdfDocument::open("report.pdf")?;
let validator = PdfAValidator::new();

// Check only metadata
let result = validator.check_metadata(&mut doc, PdfA级别::A1b)?;

// Check only fonts
let result = validator.check_fonts(&mut doc, PdfA级别::A1b)?;

// Check only color spaces
let result = validator.check_colors(&mut doc, PdfA级别::A1b)?;

// Check only 透明度
let result = validator.check_透明度(&mut doc, PdfA级别::A2b)?;

// Check only structure tags
let result = validator.check_structure(&mut doc, PdfA级别::A1a)?;

独立验证器

每个验证类别也可作为独立函数使用,提供最大灵活性:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::validators::*;
use pdf_oxide::compliance::{PdfA级别, ValidationResult};

let mut doc = PdfDocument::open("document.pdf")?;
let mut result = ValidationResult::new(PdfA级别::A1b);

// Run each validator independently
validate_xmp_metadata(&mut doc, PdfA级别::A1b, &mut result)?;
validate_fonts(&mut doc, PdfA级别::A1b, &mut result)?;
validate_colors(&mut doc, PdfA级别::A1b, &mut result)?;
validate_encryption(&mut doc, PdfA级别::A1b, &mut result)?;
validate_透明度(&mut doc, PdfA级别::A1b, &mut result)?;
validate_structure(&mut doc, PdfA级别::A1b, &mut result)?;
validate_javascript(&mut doc, PdfA级别::A1b, &mut result)?;
validate_embedded_files(&mut doc, PdfA级别::A1b, &mut result)?;
validate_annotations(&mut doc, PdfA级别::A1b, &mut result)?;

println!("Total errors: {}", result.errors.len());

验证器总结

Function What It Checks
validate_xmp_metadata() XMP 流存在, pdfaid:part and pdfaid:conformance 条目存在,元数据一致性
validate_fonts() 所有字体已嵌入,字形宽度存在,Unicode 映射可用 ((用于级别 “a” 和 “u”))
validate_colors() 无设备相关颜色操作符 (rg, RG, k, K, g, G) 无输出意图
validate_encryption() PDF/A 文档中不允许加密
validate_透明度() No 透明度 in PDF/A-1; allowed in PDF/A-2 and later
validate_structure() 带有有效角色映射的标记结构树存在 ((级别 “a” 要求))
validate_javascript() 不存在 JavaScript 操作或触发器
validate_embedded_files() PDF/A-1 和 PDF/A-2 中不允许;PDF/A-3 要求 AFRelationship 键在每个文件规范上
validate_annotations() 注释类型受相关 ISO 19005 部分限制

ValidationResult

ValidationResult 结构体包含验证运行的完整结果:

pub struct ValidationResult {
    pub level: PdfA级别,
    pub errors: Vec<ComplianceError>,
    pub warnings: Vec<ComplianceWarning>,
    pub stats: ValidationStats,
}
字段 类型 描述
level PdfA级别 目标合规级别
errors Vec<ComplianceError> 阻止合规的违规项
warnings Vec<ComplianceWarning> 可能影响质量的非阻塞问题
stats ValidationStats 检查的页面、字体和对象计数

ComplianceError

pub struct ComplianceError {
    pub code: ErrorCode,
    pub message: String,
    pub location: Option<String>,
    pub clause: Option<String>,
}

code 字段使用 ErrorCode 枚举,包含 MissingXmpMetadataFontNotEmbeddedDeviceDependentColor加密Present透明度NotAllowedMissing结构TreeJavaScriptPresentInvalidEmbeddedFile 等类别。

ComplianceWarning

pub struct ComplianceWarning {
    pub code: WarningCode,
    pub message: String,
    pub location: Option<String>,
}

PDF/A 转换

将不合规文档转换为 PDF/A 合规状态:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{convert_to_pdf_a, PdfA级别};

let mut doc = PdfDocument::open("input.pdf")?;
let result = convert_to_pdf_a(&mut doc, PdfA级别::A1b)?;

println!("Conversion actions taken:");
for action in &result.actions {
    println!("  - {}: {}", action.action_type, action.description);
}

if result.remaining_errors.is_empty() {
    println!("Document is now PDF/A-1b compliant");
} else {
    println!("{} issues could not be resolved automatically",
        result.remaining_errors.len());
}

Conversion Config

微调转换过程:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{PdfAConverter, PdfA级别, ConversionConfig};

let mut doc = PdfDocument::open("input.pdf")?;

let config = ConversionConfig::new()
    .embed_fonts(true)
    .remove_javascript(true)
    .flatten_透明度(true)
    .add_structure(true);

let result = PdfAConverter::new(PdfA级别::A2b)
    .with_config(config)
    .convert(&mut doc)?;

转换器自动执行以下操作:

  1. XMP 元数据注入 – 添加 pdfaid:partpdfaid:conformance 条目
  2. 字体嵌入 – 嵌入所有引用但未嵌入的字体
  3. JavaScript 移除 – 清除 JavaScript 操作和触发器
  4. 透明度扁平化 – 将透明元素渲染为不透明(仅 PDF/A-1)
  5. ICC 配置文件转换 – 将设备相关颜色转换为基于 ICC 的色彩空间
  6. 结构标记 – 添加基本结构标签(用于级别 “a” 目标)

工作流:验证、修复、再验证

典型的存档工作流会先验证,尝试自动转换,然后重新验证:

use pdf_oxide::PdfDocument;
use pdf_oxide::compliance::{validate_pdf_a, convert_to_pdf_a, PdfA级别};

let level = PdfA级别::A2b;
let mut doc = PdfDocument::open("input.pdf")?;

// Step 1: Initial validation
let result = validate_pdf_a(&mut doc, level)?;
if !result.has_errors() {
    println!("Already compliant");
    return Ok(());
}

println!("{} errors found, attempting conversion...", result.errors.len());

// Step 2: Automatic conversion
let conversion = convert_to_pdf_a(&mut doc, level)?;
println!("{} actions taken", conversion.actions.len());

// Step 3: Re-validate
let result = validate_pdf_a(&mut doc, level)?;
if result.has_errors() {
    println!("{} errors remain after conversion:", result.errors.len());
    for e in &result.errors {
        println!("  {} -- {}", e.code, e.message);
    }
} else {
    println!("Document is now PDF/A-2b compliant");
}

PdfALevel 方法

PdfA级别 枚举包含用于查询级别能力的辅助方法:

方法 返回值 描述
part() PdfAPart ISO 19005 part (Part1, Part2, Part3)
conformance() char Conformance letter (‘a’, ‘b’, or ‘u’)
requires_structure() bool Whether tagged structure tree is mandatory
requires_unicode() bool Whether Unicode mapping is mandatory
allows_透明度() bool Whether 透明度 is permitted
allows_jpeg2000() bool Whether JPEG 2000 images are permitted
allows_embedded_files() bool Whether file attachments are permitted
xmp_part() &str XMP pdfaid:part value
xmp_conformance() &str XMP pdfaid:conformance value
from_xmp(part, conformance) Option<Self> Parse level from XMP metadata values

下一步