What is the fastest Python PDF library?

PDF Oxide is the fastest Python PDF library, with 0.8ms mean text extraction time — 5.8× faster than PyMuPDF (4.6ms) and 15× faster than pypdf (12.1ms). Benchmarked on 3,830 real-world PDFs with 100% pass rate.

Is PDF Oxide free for commercial use?

Yes. PDF Oxide is MIT licensed — free for all uses including commercial products, SaaS, and proprietary software. No license fees, no sales calls, no AGPL restrictions.

Can PDF Oxide handle scanned PDFs with OCR?

Yes. PDF Oxide includes built-in OCR via PaddleOCR and ONNX Runtime. No Tesseract installation needed — just pip install pdf_oxide and use extract_text_ocr(). Supports PP-OCRv3, v4, and v5 models.

Does PDF Oxide support XFA forms?

Yes. PDF Oxide is the only Python PDF library that can detect, analyze, and extract data from XFA forms (XML Forms Architecture). PyMuPDF, pypdf, pdfplumber, and pdfminer cannot read XFA form data.

How does PDF Oxide compare to PyMuPDF?

PDF Oxide is 5.8× faster than PyMuPDF (0.8ms vs 4.6ms mean), has a 100% pass rate vs 99.3%, and is MIT licensed vs PyMuPDF's AGPL-3.0. PDF Oxide also has built-in Markdown/HTML output and XFA form support that PyMuPDF lacks.

Can PDF Oxide convert PDF to Markdown?

Yes. PDF Oxide has built-in PDF to Markdown conversion with heading detection, table preservation, and list formatting — ideal for LLM and RAG pipelines. No separate package needed, unlike PyMuPDF which requires pymupdf4llm (69× slower).

PDF und Office-Dokumente in beide Richtungen konvertieren

Konvertieren Sie Microsoft-Office-Dokumente (Word, Excel, PowerPoint) in PDF — und wandeln Sie ein PDF zurück in DOCX, PPTX oder XLSX — ganz ohne Microsoft Office oder LibreOffice. PDF Oxide analysiert das OOXML-Format direkt und erzeugt PDF-Ausgaben, und rendert PDF-Seiten wieder zurück in bearbeitbare Office-Dokumente.

Die Konvertierung läuft in zwei Richtungen:

Office → PDF — die Klasse OfficeConverter (sowie die Konstruktoren open_from_*_bytes) parst DOCX/XLSX/PPTX und erzeugt ein PDF.
PDF → Office — die Methoden to_docx / to_pptx / to_xlsx auf einem geöffneten Dokument exportieren zurück in Office-Formate.

Schnellbeispiel

Python

from pdf_oxide import OfficeConverter

# Auto-detect format from extension
pdf = OfficeConverter.convert("report.docx")
pdf.save("report.pdf")

Rust

use pdf_oxide::converters::office::OfficeConverter;

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert("report.docx")?;
std::fs::write("report.pdf", pdf_bytes)?;

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("report.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("report.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final docx = File('report.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('report.pdf').writeAsBytesSync(doc.getSourceBytes());

library(pdfoxide)

docx <- readBin("report.docx", "raw", file.info("report.docx")$size)
doc  <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "report.pdf")

Julia

using PdfOxide

docx = read("report.docx")
doc  = open_from_docx_bytes(docx)
write("report.pdf", get_source_bytes(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const docx = try std.fs.cwd().readFileAlloc("report.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.pdf", .data = pdf });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *docx = [NSData dataWithContentsOfFile:@"report.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"report.pdf" atomically:YES];

Elixir

docx = File.read!("report.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("report.pdf", pdf)

Unterstützte Formate

Format	Erweiterung	Beschreibung
DOCX	`.docx`	Word-Dokumente — Absätze, Überschriften, Listen, Textformatierung
XLSX	`.xlsx`, `.xls`	Excel-Tabellen — mehrere Blätter, automatische Spaltenbreite, Zellentypen
PPTX	`.pptx`	PowerPoint-Präsentationen — Folien, Titel, Textfelder

Word-Dokumente (DOCX)

Konvertieren Sie Word-Dokumente unter Beibehaltung von Überschriften, Absätzen, Listen und Textformatierung (Fett, Kursiv, Unterstrichen, Farben, Schriftgrößen).

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_docx("document.docx")
pdf.save("document.pdf")

Rust

use pdf_oxide::converters::office::OfficeConverter;

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx("document.docx")?;
std::fs::write("document.pdf", pdf_bytes)?;

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final docx = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());

library(pdfoxide)

docx <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc  <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "document.pdf")

Julia

using PdfOxide

docx = read("document.docx")
doc  = open_from_docx_bytes(docx)
write("document.pdf", get_source_bytes(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const docx = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *docx = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"document.pdf" atomically:YES];

Elixir

docx = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf)

Aus Bytes

Python

from pdf_oxide import OfficeConverter

with open("document.docx", "rb") as f:
    pdf = OfficeConverter.from_docx_bytes(f.read())
pdf.save("document.pdf")

Rust

let docx_bytes = std::fs::read("document.docx")?;
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx_bytes(&docx_bytes)?;
std::fs::write("document.pdf", pdf_bytes)?;

C++

std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx_bytes((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx_bytes);
auto pdf_bytes = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf_bytes.data()), pdf_bytes.size());

Dart

final docxBytes = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docxBytes);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());

docx_bytes <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc <- pdf_open_from_docx_bytes(docx_bytes)
writeBin(pdf_get_source_bytes(doc), "document.pdf")

Julia

docx_bytes = read("document.docx")
doc = open_from_docx_bytes(docx_bytes)
write("document.pdf", get_source_bytes(doc))

Zig

const docx_bytes = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx_bytes);
const pdf_bytes = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf_bytes });

Objective-C

NSData *docxBytes = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docxBytes error:&err];
NSData *pdfBytes = [doc sourceBytesWithError:&err];
[pdfBytes writeToFile:@"document.pdf" atomically:YES];

Elixir

docx_bytes = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx_bytes)
{:ok, pdf_bytes} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf_bytes)

Unterstützte DOCX-Funktionen

Absätze mit Ausrichtung (links, zentriert, rechts, Blocksatz)
Überschriften (Überschriftenstile 1–9)
Textformatierung: Fett, Kursiv, Unterstrichen, Durchgestrichen
Schriftgrößen und -farben
Nummerierte und Aufzählungslisten mit Einrückung
Metadaten-Extraktion (Titel, Autor aus docProps/core.xml)

Excel-Tabellen (XLSX)

Konvertieren Sie Tabellen in PDF mit automatischer Spaltenbreitenberechnung und Unterstützung für mehrere Blätter. Jedes Blatt wird als separater Abschnitt gerendert.

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_xlsx("data.xlsx")
pdf.save("data.pdf")

Rust

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_xlsx("data.xlsx")?;
std::fs::write("data.pdf", pdf_bytes)?;

C++

std::ifstream in("data.xlsx", std::ios::binary);
std::vector<std::uint8_t> xlsx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_xlsx_bytes(xlsx);
auto pdf = doc.get_source_bytes();
std::ofstream("data.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

final xlsx = File('data.xlsx').readAsBytesSync();
final doc = PdfDocument.openFromXlsxBytes(xlsx);
File('data.pdf').writeAsBytesSync(doc.getSourceBytes());

xlsx <- readBin("data.xlsx", "raw", file.info("data.xlsx")$size)
doc  <- pdf_open_from_xlsx_bytes(xlsx)
writeBin(pdf_get_source_bytes(doc), "data.pdf")

Julia

xlsx = read("data.xlsx")
doc  = open_from_xlsx_bytes(xlsx)
write("data.pdf", get_source_bytes(doc))

Zig

const xlsx = try std.fs.cwd().readFileAlloc("data.xlsx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromXlsxBytes(xlsx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "data.pdf", .data = pdf });

Objective-C

NSData *xlsx = [NSData dataWithContentsOfFile:@"data.xlsx"];
POXDocument *doc = [POXDocument openFromXlsxBytes:xlsx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"data.pdf" atomically:YES];

Elixir

xlsx = File.read!("data.xlsx")
{:ok, doc} = PdfOxide.open_from_xlsx_bytes(xlsx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("data.pdf", pdf)

Unterstützte XLSX-Funktionen

Mehrseitiges Rendering mit Blatttiteln
Zellentypen: Zeichenketten, Ganzzahlen, Gleitkommazahlen, Booleans, Datumsangaben, Fehler
Automatische Spaltenbreitenberechnung
Automatische Seitenumbrüche bei Überschreitung des verfügbaren Platzes

PowerPoint-Präsentationen (PPTX)

Konvertieren Sie Präsentationen in PDF. Jede Folie wird mit extrahierten Titeln und Textfeldern zu einer Seite.

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_pptx("slides.pptx")
pdf.save("slides.pdf")

Rust

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_pptx("slides.pptx")?;
std::fs::write("slides.pdf", pdf_bytes)?;

C++

std::ifstream in("slides.pptx", std::ios::binary);
std::vector<std::uint8_t> pptx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_pptx_bytes(pptx);
auto pdf = doc.get_source_bytes();
std::ofstream("slides.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

final pptx = File('slides.pptx').readAsBytesSync();
final doc = PdfDocument.openFromPptxBytes(pptx);
File('slides.pdf').writeAsBytesSync(doc.getSourceBytes());

pptx <- readBin("slides.pptx", "raw", file.info("slides.pptx")$size)
doc  <- pdf_open_from_pptx_bytes(pptx)
writeBin(pdf_get_source_bytes(doc), "slides.pdf")

Julia

pptx = read("slides.pptx")
doc  = open_from_pptx_bytes(pptx)
write("slides.pdf", get_source_bytes(doc))

Zig

const pptx = try std.fs.cwd().readFileAlloc("slides.pptx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromPptxBytes(pptx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "slides.pdf", .data = pdf });

Objective-C

NSData *pptx = [NSData dataWithContentsOfFile:@"slides.pptx"];
POXDocument *doc = [POXDocument openFromPptxBytes:pptx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"slides.pdf" atomically:YES];

Elixir

pptx = File.read!("slides.pptx")
{:ok, doc} = PdfOxide.open_from_pptx_bytes(pptx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("slides.pdf", pdf)

Wie konvertiere ich ein PDF in DOCX, PPTX oder XLSX?

Die Gegenrichtung — PDF → Office — ist als Methode auf einem geöffneten PDF-Dokument verfügbar, nicht auf OfficeConverter. Öffnen Sie ein PDF mit PdfDocument (Python/Rust), OpenFromBytes/Open (Go/C#) oder Document.open (Swift) und rufen Sie dann to_docx / to_pptx / to_xlsx auf.

PDF Oxide wählt die Ausgabestrategie automatisch anhand der Seitenzahl: Dokumente bis zum Layout-Schwellenwert (30 Seiten für DOCX/PPTX, 200 für XLSX) verwenden den layout-erhaltenden Pfad, der jeden Textspan nahe seiner Originalposition als positioniertes bearbeitbares Element ausgibt; größere Dokumente fallen auf den Fließtext-Pfad zurück, der den Inhalt umbricht, damit Word/PowerPoint/Excel sie sofort öffnen können. Jede PDF-Seite wird zu einem DOCX-Abschnitt, einer PPTX-Folie oder einem XLSX-Arbeitsblatt; die Seitenabmessungen und eingebetteten Schriften werden beibehalten, sodass bei einer Rundreise PDF → Office → PDF das ursprüngliche Layout erhalten bleibt.

PDF zu Word (DOCX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

// Write straight to disk
doc.to_docx("report.docx")?;

// Or get the bytes in memory
let docx_bytes: Vec<u8> = doc.to_docx_bytes()?;
std::fs::write("report.docx", docx_bytes)?;

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

# Write straight to disk
doc.to_docx("report.docx")

# Or get the bytes in memory
docx_bytes = doc.to_docx_bytes()
with open("report.docx", "wb") as f:
    f.write(docx_bytes)

doc, err := pdfoxide.Open("report.pdf")
if err != nil {
    log.Fatal(err)
}
defer doc.Close()

docxBytes, err := doc.ToDocxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("report.docx", docxBytes, 0o644)

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
byte[] docxBytes = doc.ToDocxBytes();
File.WriteAllBytes("report.docx", docxBytes);

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let docxBytes = try doc.toDocx()
try Data(docxBytes).write(to: URL(fileURLWithPath: "report.docx"))

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

auto doc = pdf_oxide::Document::open("report.pdf");
auto docx_bytes = doc.to_docx();
std::ofstream("report.docx", std::ios::binary)
    .write(reinterpret_cast<const char*>(docx_bytes.data()), docx_bytes.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
File('report.docx').writeAsBytesSync(doc.toDocx());

library(pdfoxide)

doc <- pdf_open("report.pdf")
writeBin(pdf_to_docx(doc), "report.docx")

Julia

using PdfOxide

doc = open_document("report.pdf")
write("report.docx", to_docx(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("report.pdf");
const docx_bytes = try doc.toDocx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.docx", .data = docx_bytes });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSData *docxBytes = [doc toDocxWithError:&err];
[docxBytes writeToFile:@"report.docx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, docx_bytes} = PdfOxide.to_docx(doc)
File.write!("report.docx", docx_bytes)

PDF zu PowerPoint (PPTX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("deck.pdf")?;
doc.to_pptx("deck.pptx")?;            // to disk
let pptx_bytes = doc.to_pptx_bytes()?; // or in memory

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("deck.pdf")
doc.to_pptx("deck.pptx")           # to disk
pptx_bytes = doc.to_pptx_bytes()   # or in memory

doc, _ := pdfoxide.Open("deck.pdf")
defer doc.Close()
pptxBytes, err := doc.ToPptxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("deck.pptx", pptxBytes, 0o644)

using var doc = PdfDocument.Open("deck.pdf");
File.WriteAllBytes("deck.pptx", doc.ToPptxBytes());

Swift

let doc = try Document.open("deck.pdf")
let pptxBytes = try doc.toPptx()
try Data(pptxBytes).write(to: URL(fileURLWithPath: "deck.pptx"))

C++

auto doc = pdf_oxide::Document::open("deck.pdf");
auto pptx_bytes = doc.to_pptx();
std::ofstream("deck.pptx", std::ios::binary)
    .write(reinterpret_cast<const char*>(pptx_bytes.data()), pptx_bytes.size());

Dart

final doc = PdfDocument.open('deck.pdf');
File('deck.pptx').writeAsBytesSync(doc.toPptx());

doc <- pdf_open("deck.pdf")
writeBin(pdf_to_pptx(doc), "deck.pptx")

Julia

doc = open_document("deck.pdf")
write("deck.pptx", to_pptx(doc))

Zig

var doc = try pdf_oxide.Document.open("deck.pdf");
const pptx_bytes = try doc.toPptx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "deck.pptx", .data = pptx_bytes });

Objective-C

POXDocument *doc = [POXDocument openPath:@"deck.pdf" error:&err];
NSData *pptxBytes = [doc toPptxWithError:&err];
[pptxBytes writeToFile:@"deck.pptx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("deck.pdf")
{:ok, pptx_bytes} = PdfOxide.to_pptx(doc)
File.write!("deck.pptx", pptx_bytes)

PDF zu Excel (XLSX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("table.pdf")?;
doc.to_xlsx("table.xlsx")?;            // to disk
let xlsx_bytes = doc.to_xlsx_bytes()?; // or in memory

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("table.pdf")
doc.to_xlsx("table.xlsx")          # to disk
xlsx_bytes = doc.to_xlsx_bytes()   # or in memory

doc, _ := pdfoxide.Open("table.pdf")
defer doc.Close()
xlsxBytes, err := doc.ToXlsxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("table.xlsx", xlsxBytes, 0o644)

using var doc = PdfDocument.Open("table.pdf");
File.WriteAllBytes("table.xlsx", doc.ToXlsxBytes());

Swift

let doc = try Document.open("table.pdf")
let xlsxBytes = try doc.toXlsx()
try Data(xlsxBytes).write(to: URL(fileURLWithPath: "table.xlsx"))

C++

auto doc = pdf_oxide::Document::open("table.pdf");
auto xlsx_bytes = doc.to_xlsx();
std::ofstream("table.xlsx", std::ios::binary)
    .write(reinterpret_cast<const char*>(xlsx_bytes.data()), xlsx_bytes.size());

Dart

final doc = PdfDocument.open('table.pdf');
File('table.xlsx').writeAsBytesSync(doc.toXlsx());

doc <- pdf_open("table.pdf")
writeBin(pdf_to_xlsx(doc), "table.xlsx")

Julia

doc = open_document("table.pdf")
write("table.xlsx", to_xlsx(doc))

Zig

var doc = try pdf_oxide.Document.open("table.pdf");
const xlsx_bytes = try doc.toXlsx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "table.xlsx", .data = xlsx_bytes });

Objective-C

POXDocument *doc = [POXDocument openPath:@"table.pdf" error:&err];
NSData *xlsxBytes = [doc toXlsxWithError:&err];
[xlsxBytes writeToFile:@"table.xlsx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("table.pdf")
{:ok, xlsx_bytes} = PdfOxide.to_xlsx(doc)
File.write!("table.xlsx", xlsx_bytes)

Python-Hinweis: to_docx/to_pptx/to_xlsx sind auf PdfDocument (die Extraktions-/Inspektionsklasse) verfügbar, nicht auf dem OfficeConverter/Pdf-Builder für die Richtung Office → PDF. Verwenden Sie PdfDocument("file.pdf"), um das Quell-PDF zu öffnen.

Wie öffne ich eine Office-Datei direkt als PDF-Dokument?

Die nativen Bindings (Go, C#, Swift und das C-ABI) stellen open_from_*_bytes-Konstruktoren bereit, die DOCX/PPTX/XLSX-Bytes konvertieren und ein bereits geöffnetes PdfDocument zurückgeben — praktisch, wenn Sie sofort Text extrahieren, rendern oder reexportieren wollen, ohne das PDF zwischenzuspeichern. Jeder Konstruktor führt intern OfficeConverter aus und öffnet das resultierende PDF in einem einzigen Aufruf.

data, err := os.ReadFile("contract.docx")
if err != nil {
    log.Fatal(err)
}

doc, err := pdfoxide.OpenFromDocxBytes(data)
if err != nil {
    log.Fatal(err)
}
defer doc.Close()

// Now work with it as a normal PDF document
text, _ := doc.ExtractText(0)
fmt.Println(text)

using PdfOxide.Core;

byte[] data = File.ReadAllBytes("contract.docx");
using var doc = PdfDocument.OpenFromDocxBytes(data);

// Use it like any other open PDF — extract, render, or re-export
byte[] pdfBytes = doc.ToDocxBytes(); // round-trip if you like

Swift

import PdfOxide
import Foundation

let data = try Data(contentsOf: URL(fileURLWithPath: "contract.docx"))
let doc = try Document.openFromDocxBytes([UInt8](data))
let pageCount = try doc.pageCount()
print("Converted DOCX has \(pageCount) page(s)")

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("contract.docx", std::ios::binary);
std::vector<std::uint8_t> data((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(data);
// Now work with it as a normal PDF document
auto text = doc.extract_text(0);

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final data = File('contract.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(data);
final text = doc.extractText(0);

library(pdfoxide)

data <- readBin("contract.docx", "raw", file.info("contract.docx")$size)
doc  <- pdf_open_from_docx_bytes(data)
text <- pdf_extract_text(doc, 0)

Julia

using PdfOxide

data = read("contract.docx")
doc  = open_from_docx_bytes(data)
text = extract_text(doc, 0)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const data = try std.fs.cwd().readFileAlloc("contract.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(data);
const text = try doc.extractText(a, 0);

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *data = [NSData dataWithContentsOfFile:@"contract.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:data error:&err];
NSString *text = [doc extractText:0 error:&err];

Elixir

data = File.read!("contract.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(data)
{:ok, text} = PdfOxide.extract_text(doc, 0)

Für PPTX und XLSX werden die entsprechenden Konstruktoren verwendet:

Quellformat	Go	C#	Swift
DOCX	`OpenFromDocxBytes(data)`	`PdfDocument.OpenFromDocxBytes(data)`	`Document.openFromDocxBytes(bytes)`
PPTX	`OpenFromPptxBytes(data)`	`PdfDocument.OpenFromPptxBytes(data)`	`Document.openFromPptxBytes(bytes)`
XLSX	`OpenFromXlsxBytes(data)`	`PdfDocument.OpenFromXlsxBytes(data)`	`Document.openFromXlsxBytes(bytes)`

Rust / Python: Das Kern-PdfDocument hat keinen open_from_docx_bytes-Konstruktor. In Rust konvertieren Sie zuerst mit OfficeConverter::new().convert_docx_bytes(&data)? und öffnen dann mit PdfDocument::from_bytes(pdf_bytes)?. In Python verwenden Sie OfficeConverter.from_docx_bytes(data) (wie oben dokumentiert), das ein Pdf zurückgibt.

use pdf_oxide::converters::office::OfficeConverter;
use pdf_oxide::document::PdfDocument;

let data = std::fs::read("contract.docx")?;
let pdf_bytes = OfficeConverter::new().convert_docx_bytes(&data)?;
let doc = PdfDocument::from_bytes(pdf_bytes)?;
println!("{} pages", doc.page_count()?);

Konfiguration (Rust)

Passen Sie Seitengröße, Ränder und Schriften mit OfficeConfig an:

use pdf_oxide::converters::office::{OfficeConverter, OfficeConfig};

let config = OfficeConfig::a4(); // A4 page size
let converter = OfficeConverter::with_config(config);
let pdf_bytes = converter.convert_docx("document.docx")?;

OfficeConfig-Felder

Feld	Typ	Standard	Beschreibung
`page_size`	`PageSize`	Letter	Seitenabmessungen
`margins`	`Margins`	1 Zoll rundum	Seitenränder in Punkt (72pt = 1 Zoll)
`embed_fonts`	`bool`	`false`	Ob Schriften eingebettet werden sollen
`default_font`	`String`	`"Helvetica"`	Fallback-Schrift
`default_font_size`	`f32`	`11.0`	Standardtextgröße in Punkt
`line_height`	`f32`	`1.2`	Zeilenhöhen-Multiplikator
`include_images`	`bool`	`true`	Ob eingebettete Bilder enthalten sein sollen

Seitengrößen-Voreinstellungen

let config = OfficeConfig::letter(); // 8.5 × 11 inches (default)
let config = OfficeConfig::a4();     // 210 × 297 mm

Benutzerdefinierte Ränder

use pdf_oxide::converters::office::Margins;

let mut config = OfficeConfig::letter();
config.margins = Margins::uniform(36.0);  // 0.5 inch margins
config.margins = Margins::none();          // No margins

Stapelkonvertierung

Python

from pdf_oxide import OfficeConverter
from pathlib import Path

office_dir = Path("documents/")
output_dir = Path("pdfs/")
output_dir.mkdir(exist_ok=True)

extensions = {".docx", ".xlsx", ".pptx"}

for doc_path in office_dir.iterdir():
    if doc_path.suffix.lower() in extensions:
        pdf = OfficeConverter.convert(str(doc_path))
        pdf.save(str(output_dir / doc_path.with_suffix(".pdf").name))
        print(f"Converted: {doc_path.name}")

Rust

use pdf_oxide::converters::office::OfficeConverter;
use std::fs;

let converter = OfficeConverter::new();

for entry in fs::read_dir("documents/")? {
    let path = entry?.path();
    match path.extension().and_then(|e| e.to_str()) {
        Some("docx" | "xlsx" | "pptx") => {
            let pdf_bytes = converter.convert(&path)?;
            let out = format!("pdfs/{}.pdf", path.file_stem().unwrap().to_str().unwrap());
            fs::write(&out, pdf_bytes)?;
            println!("Converted: {}", path.display());
        }
        _ => {}
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;

for (const auto& entry : fs::directory_iterator("documents/")) {
    auto path = entry.path();
    auto ext = path.extension().string();

    if (ext != ".docx" && ext != ".xlsx" && ext != ".pptx") continue;

    std::ifstream in(path, std::ios::binary);
    std::vector<std::uint8_t> bytes((std::istreambuf_iterator<char>(in)), {});

    auto doc =
        ext == ".docx" ? pdf_oxide::Document::open_from_docx_bytes(bytes)
        : ext == ".xlsx" ? pdf_oxide::Document::open_from_xlsx_bytes(bytes)
                         : pdf_oxide::Document::open_from_pptx_bytes(bytes);

    auto pdf = doc.get_source_bytes();
    auto out = "pdfs/" + path.stem().string() + ".pdf";
    std::ofstream(out, std::ios::binary)
        .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
}

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

Directory('pdfs').createSync(recursive: true);

for (final entry in Directory('documents').listSync()) {
  if (entry is! File) continue;
  final ext = entry.path.split('.').last.toLowerCase();
  final bytes = entry.readAsBytesSync();

  final doc = switch (ext) {
    'docx' => PdfDocument.openFromDocxBytes(bytes),
    'xlsx' => PdfDocument.openFromXlsxBytes(bytes),
    'pptx' => PdfDocument.openFromPptxBytes(bytes),
    _ => null,
  };
  if (doc == null) continue;

  final name = entry.uri.pathSegments.last.replaceAll(RegExp(r'\.\w+$'), '');
  File('pdfs/$name.pdf').writeAsBytesSync(doc.getSourceBytes());
}

library(pdfoxide)

dir.create("pdfs", showWarnings = FALSE)

for (path in list.files("documents", full.names = TRUE)) {
  ext   <- tolower(tools::file_ext(path))
  bytes <- readBin(path, "raw", file.info(path)$size)

  doc <- switch(ext,
    docx = pdf_open_from_docx_bytes(bytes),
    xlsx = pdf_open_from_xlsx_bytes(bytes),
    pptx = pdf_open_from_pptx_bytes(bytes),
    next)

  out <- file.path("pdfs", paste0(tools::file_path_sans_ext(basename(path)), ".pdf"))
  writeBin(pdf_get_source_bytes(doc), out)
}

Julia

using PdfOxide

mkpath("pdfs")

for path in readdir("documents"; join = true)
    ext   = lowercase(splitext(path)[2])
    bytes = read(path)

    doc = if ext == ".docx"
        open_from_docx_bytes(bytes)
    elseif ext == ".xlsx"
        open_from_xlsx_bytes(bytes)
    elseif ext == ".pptx"
        open_from_pptx_bytes(bytes)
    else
        continue
    end

    name = first(splitext(basename(path)))
    write(joinpath("pdfs", name * ".pdf"), get_source_bytes(doc))
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

try std.fs.cwd().makePath("pdfs");
var dir = try std.fs.cwd().openDir("documents", .{ .iterate = true });
var it = dir.iterate();
while (try it.next()) |entry| {
    const bytes = try dir.readFileAlloc(entry.name, a, .unlimited);

    var doc = if (std.mem.endsWith(u8, entry.name, ".docx"))
        try pdf_oxide.Document.openFromDocxBytes(bytes)
    else if (std.mem.endsWith(u8, entry.name, ".xlsx"))
        try pdf_oxide.Document.openFromXlsxBytes(bytes)
    else if (std.mem.endsWith(u8, entry.name, ".pptx"))
        try pdf_oxide.Document.openFromPptxBytes(bytes)
    else
        continue;

    const pdf = try doc.sourceBytes(a);
    const stem = entry.name[0 .. std.mem.lastIndexOfScalar(u8, entry.name, '.').?];
    const out = try std.fmt.allocPrint(a, "pdfs/{s}.pdf", .{stem});
    try std.fs.cwd().writeFile(.{ .sub_path = out, .data = pdf });
}

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;
NSFileManager *fm = [NSFileManager defaultManager];
[fm createDirectoryAtPath:@"pdfs" withIntermediateDirectories:YES attributes:nil error:&err];

for (NSString *name in [fm contentsOfDirectoryAtPath:@"documents" error:&err]) {
    NSString *path = [@"documents" stringByAppendingPathComponent:name];
    NSData *bytes = [NSData dataWithContentsOfFile:path];
    NSString *ext = name.pathExtension.lowercaseString;

    POXDocument *doc;
    if ([ext isEqualToString:@"docx"])      doc = [POXDocument openFromDocxBytes:bytes error:&err];
    else if ([ext isEqualToString:@"xlsx"]) doc = [POXDocument openFromXlsxBytes:bytes error:&err];
    else if ([ext isEqualToString:@"pptx"]) doc = [POXDocument openFromPptxBytes:bytes error:&err];
    else continue;

    NSData *pdf = [doc sourceBytesWithError:&err];
    NSString *out = [@"pdfs" stringByAppendingPathComponent:
        [name.stringByDeletingPathExtension stringByAppendingPathExtension:@"pdf"]];
    [pdf writeToFile:out atomically:YES];
}

Elixir

File.mkdir_p!("pdfs")

for name <- File.ls!("documents") do
  bytes = File.read!(Path.join("documents", name))

  result =
    case Path.extname(name) |> String.downcase() do
      ".docx" -> PdfOxide.open_from_docx_bytes(bytes)
      ".xlsx" -> PdfOxide.open_from_xlsx_bytes(bytes)
      ".pptx" -> PdfOxide.open_from_pptx_bytes(bytes)
      _ -> :skip
    end

  with {:ok, doc} <- result,
       {:ok, pdf} <- PdfOxide.source_bytes(doc) do
    out = Path.join("pdfs", Path.rootname(name) <> ".pdf")
    File.write!(out, pdf)
  end
end

API-Referenz

Python — OfficeConverter

Methode	Rückgabe	Beschreibung
`OfficeConverter.convert(path)`	`Pdf`	Format automatisch erkennen und konvertieren
`OfficeConverter.from_docx(path)`	`Pdf`	DOCX-Datei konvertieren
`OfficeConverter.from_docx_bytes(data)`	`Pdf`	DOCX aus Bytes konvertieren
`OfficeConverter.from_xlsx(path)`	`Pdf`	XLSX-Datei konvertieren
`OfficeConverter.from_xlsx_bytes(data)`	`Pdf`	XLSX aus Bytes konvertieren
`OfficeConverter.from_pptx(path)`	`Pdf`	PPTX-Datei konvertieren
`OfficeConverter.from_pptx_bytes(data)`	`Pdf`	PPTX aus Bytes konvertieren

Alle Methoden geben ein Pdf-Objekt zurück. Rufen Sie pdf.save("output.pdf") oder pdf.to_bytes() auf, um das Ergebnis zu erhalten.

Rust — OfficeConverter

Methode	Rückgabe	Beschreibung
`OfficeConverter::new()`	`OfficeConverter`	Mit Standardkonfiguration erstellen
`OfficeConverter::with_config(config)`	`OfficeConverter`	Mit benutzerdefinierter Konfiguration erstellen
`convert(path)`	`Result<Vec<u8>>`	Format automatisch erkennen und konvertieren
`convert_docx(path)`	`Result<Vec<u8>>`	DOCX-Datei konvertieren
`convert_docx_bytes(bytes)`	`Result<Vec<u8>>`	DOCX aus Bytes konvertieren
`convert_xlsx(path)`	`Result<Vec<u8>>`	XLSX-Datei konvertieren
`convert_xlsx_bytes(bytes)`	`Result<Vec<u8>>`	XLSX aus Bytes konvertieren
`convert_pptx(path)`	`Result<Vec<u8>>`	PPTX-Datei konvertieren
`convert_pptx_bytes(bytes)`	`Result<Vec<u8>>`	PPTX aus Bytes konvertieren

PDF → Office — `to_docx` / `to_pptx` / `to_xlsx`

Export aus einem geöffneten PDF-Dokument. Verfügbar in Rust, Python, Go, C# und Swift.

Sprache	Methode	Rückgabe	Beschreibung
Rust	`PdfDocument::to_docx(path)`	`Result<()>`	PDF in DOCX-Datei auf Festplatte exportieren
Rust	`PdfDocument::to_docx_bytes()`	`Result<Vec<u8>>`	PDF in DOCX-Bytes exportieren
Rust	`PdfDocument::to_pptx(path)` / `to_pptx_bytes()`	`Result<()>` / `Result<Vec<u8>>`	PDF nach PPTX exportieren
Rust	`PdfDocument::to_xlsx(path)` / `to_xlsx_bytes()`	`Result<()>` / `Result<Vec<u8>>`	PDF nach XLSX exportieren
Python	`PdfDocument.to_docx(path)` / `to_docx_bytes()`	`None` / `bytes`	PDF nach DOCX exportieren
Python	`PdfDocument.to_pptx(path)` / `to_pptx_bytes()`	`None` / `bytes`	PDF nach PPTX exportieren
Python	`PdfDocument.to_xlsx(path)` / `to_xlsx_bytes()`	`None` / `bytes`	PDF nach XLSX exportieren
Go	`(*PdfDocument).ToDocxBytes()`	`([]byte, error)`	PDF in DOCX-Bytes exportieren
Go	`(*PdfDocument).ToPptxBytes()`	`([]byte, error)`	PDF in PPTX-Bytes exportieren
Go	`(*PdfDocument).ToXlsxBytes()`	`([]byte, error)`	PDF in XLSX-Bytes exportieren
C#	`PdfDocument.ToDocxBytes()`	`byte[]`	PDF in DOCX-Bytes exportieren
C#	`PdfDocument.ToPptxBytes()`	`byte[]`	PDF in PPTX-Bytes exportieren
C#	`PdfDocument.ToXlsxBytes()`	`byte[]`	PDF in XLSX-Bytes exportieren
Swift	`Document.toDocx()`	`[UInt8]`	PDF in DOCX-Bytes exportieren
Swift	`Document.toPptx()`	`[UInt8]`	PDF in PPTX-Bytes exportieren
Swift	`Document.toXlsx()`	`[UInt8]`	PDF in XLSX-Bytes exportieren

Office → PDF-Dokument — `open_from_*_bytes`

Komfort-Konstruktoren der nativen Bindings, die Office-Bytes konvertieren und ein geöffnetes PDF-Dokument zurückgeben. Verfügbar in Go, C#, Swift und C-ABI. Nicht verfügbar im Rust-Kern-PdfDocument oder in Python — verwenden Sie dort OfficeConverter (siehe Tabelle oben).

Sprache	Konstruktor	Rückgabe	Beschreibung
Go	`OpenFromDocxBytes(data)`	`(*PdfDocument, error)`	PDF-Dokument aus DOCX-Bytes öffnen
Go	`OpenFromPptxBytes(data)`	`(*PdfDocument, error)`	PDF-Dokument aus PPTX-Bytes öffnen
Go	`OpenFromXlsxBytes(data)`	`(*PdfDocument, error)`	PDF-Dokument aus XLSX-Bytes öffnen
C#	`PdfDocument.OpenFromDocxBytes(data)`	`PdfDocument`	PDF-Dokument aus DOCX-Bytes öffnen
C#	`PdfDocument.OpenFromPptxBytes(data)`	`PdfDocument`	PDF-Dokument aus PPTX-Bytes öffnen
C#	`PdfDocument.OpenFromXlsxBytes(data)`	`PdfDocument`	PDF-Dokument aus XLSX-Bytes öffnen
Swift	`Document.openFromDocxBytes(bytes)`	`Document`	PDF-Dokument aus DOCX-Bytes öffnen
Swift	`Document.openFromPptxBytes(bytes)`	`Document`	PDF-Dokument aus PPTX-Bytes öffnen
Swift	`Document.openFromXlsxBytes(bytes)`	`Document`	PDF-Dokument aus XLSX-Bytes öffnen
C ABI	`pdf_document_open_from_docx_bytes(data, len, error_code)`	`PdfDocument *`	PDF-Dokument aus DOCX-Bytes öffnen
C ABI	`pdf_document_open_from_pptx_bytes(data, len, error_code)`	`PdfDocument *`	PDF-Dokument aus PPTX-Bytes öffnen
C ABI	`pdf_document_open_from_xlsx_bytes(data, len, error_code)`	`PdfDocument *`	PDF-Dokument aus XLSX-Bytes öffnen

Häufig gestellte Fragen

Bleibt das Layout beim Konvertieren eines PDF in DOCX erhalten?

Ja, in gewissem Umfang. Für Dokumente bis zum Layout-Schwellenwert (30 Seiten für DOCX/PPTX, 200 für XLSX) verwenden to_docx_bytes / to_pptx_bytes / to_xlsx_bytes den layout-erhaltenden Pfad, der jeden PDF-Textspan als positioniertes bearbeitbares Element ausgibt und die Schriften des Quell-PDF einbettet, sodass bei einer Rundreise PDF → Office → PDF die ursprünglichen Seitenabmessungen erhalten bleiben. Größere Dokumente fallen auf den Fließtext-Pfad zurück, der Text in echte Absätze umschreibt, damit Word/PowerPoint/Excel sie sofort öffnen können.

Kann ich ein PDF nicht nur nach Word, sondern auch nach PowerPoint oder Excel konvertieren?

Ja. to_pptx/to_pptx_bytes bilden jede PDF-Seite auf eine Folie mit den Abmessungen der Quell-MediaBox ab, und to_xlsx/to_xlsx_bytes bilden jede Seite auf ein Arbeitsblatt ab. Beide sind in Rust, Python, Go, C# und Swift verfügbar.

Warum gibt es in Python keinen `open_from_docx_bytes`-Konstruktor?

Python stellt die Richtung Office → PDF über die übergeordnete OfficeConverter-Klasse bereit (OfficeConverter.from_docx_bytes(data) gibt ein Pdf zurück). Die open_from_*_bytes-Konstruktoren sind Komfort-Wrapper, die auf der nativen FFI-Schicht (Go, C#, Swift, C-ABI) hinzugefügt wurden, wo es keine separate Konverterklasse gibt.

Muss ich Microsoft Office oder LibreOffice installiert haben?

Nein. PDF Oxide liest und schreibt das OOXML-Format (DOCX/XLSX/PPTX) direkt in reinem Rust. Es gibt keine externen Prozessaufrufe, keine COM-Automatisierung und keine headlosen Office-Instanzen — die Konvertierung funktioniert identisch auf Linux, macOS und Windows.

PDF und Office-Dokumente in beide Richtungen konvertieren

Schnellbeispiel

Unterstützte Formate

Word-Dokumente (DOCX)

Aus Bytes

Unterstützte DOCX-Funktionen

Excel-Tabellen (XLSX)

Unterstützte XLSX-Funktionen

PowerPoint-Präsentationen (PPTX)

Wie konvertiere ich ein PDF in DOCX, PPTX oder XLSX?

PDF zu Word (DOCX)

PDF zu PowerPoint (PPTX)

PDF zu Excel (XLSX)

Wie öffne ich eine Office-Datei direkt als PDF-Dokument?

Konfiguration (Rust)

OfficeConfig-Felder

Seitengrößen-Voreinstellungen

Benutzerdefinierte Ränder

Stapelkonvertierung

API-Referenz

Python — OfficeConverter

Rust — OfficeConverter

PDF → Office — to_docx / to_pptx / to_xlsx

Office → PDF-Dokument — open_from_*_bytes

Häufig gestellte Fragen

Bleibt das Layout beim Konvertieren eines PDF in DOCX erhalten?

Kann ich ein PDF nicht nur nach Word, sondern auch nach PowerPoint oder Excel konvertieren?

Warum gibt es in Python keinen open_from_docx_bytes-Konstruktor?

Muss ich Microsoft Office oder LibreOffice installiert haben?

Verwandte Seiten

PDF → Office — `to_docx` / `to_pptx` / `to_xlsx`

Office → PDF-Dokument — `open_from_*_bytes`

Warum gibt es in Python keinen `open_from_docx_bytes`-Konstruktor?