Convert PDF to and from Office Documents
Convert Microsoft Office documents (Word, Excel, PowerPoint) to PDF — and convert a PDF back to DOCX, PPTX, or XLSX — without Microsoft Office or LibreOffice installed. PDF Oxide parses the OOXML format directly and produces PDF output, and renders PDF pages back into editable Office documents.
Conversion runs in two directions:
- Office → PDF — the
OfficeConverterclass (and theopen_from_*_bytesconstructors) parse DOCX/XLSX/PPTX and produce a PDF. - PDF → Office — the
to_docx/to_pptx/to_xlsxmethods on an open document export back to Office formats.
Quick Example
Python
from pdf_oxide import OfficeConverter
# Auto-detect format from extension
pdf = OfficeConverter.convert("report.docx")
pdf.save("report.pdf")
Rust
use pdf_oxide::converters::office::OfficeConverter;
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert("report.docx")?;
std::fs::write("report.pdf", pdf_bytes)?;
C++
#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>
std::ifstream in("report.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("report.pdf", std::ios::binary)
.write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
Dart
import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';
final docx = File('report.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('report.pdf').writeAsBytesSync(doc.getSourceBytes());
R
library(pdfoxide)
docx <- readBin("report.docx", "raw", file.info("report.docx")$size)
doc <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "report.pdf")
Julia
using PdfOxide
docx = read("report.docx")
doc = open_from_docx_bytes(docx)
write("report.pdf", get_source_bytes(doc))
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
const docx = try std.fs.cwd().readFileAlloc("report.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.pdf", .data = pdf });
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
NSData *docx = [NSData dataWithContentsOfFile:@"report.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"report.pdf" atomically:YES];
Elixir
docx = File.read!("report.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("report.pdf", pdf)
Supported Formats
| Format | Extension | Description |
|---|---|---|
| DOCX | .docx |
Word documents — paragraphs, headings, lists, text formatting |
| XLSX | .xlsx, .xls |
Excel spreadsheets — multi-sheet, auto-sized columns, cell types |
| PPTX | .pptx |
PowerPoint presentations — slides, titles, text boxes |
Word Documents (DOCX)
Convert Word documents preserving headings, paragraphs, lists, and text formatting (bold, italic, underline, colors, font sizes).
Python
from pdf_oxide import OfficeConverter
pdf = OfficeConverter.from_docx("document.docx")
pdf.save("document.pdf")
Rust
use pdf_oxide::converters::office::OfficeConverter;
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx("document.docx")?;
std::fs::write("document.pdf", pdf_bytes)?;
C++
#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>
std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
.write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
Dart
import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';
final docx = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());
R
library(pdfoxide)
docx <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "document.pdf")
Julia
using PdfOxide
docx = read("document.docx")
doc = open_from_docx_bytes(docx)
write("document.pdf", get_source_bytes(doc))
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
const docx = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf });
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
NSData *docx = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"document.pdf" atomically:YES];
Elixir
docx = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf)
From Bytes
Python
from pdf_oxide import OfficeConverter
with open("document.docx", "rb") as f:
pdf = OfficeConverter.from_docx_bytes(f.read())
pdf.save("document.pdf")
Rust
let docx_bytes = std::fs::read("document.docx")?;
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx_bytes(&docx_bytes)?;
std::fs::write("document.pdf", pdf_bytes)?;
C++
std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx_bytes((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_docx_bytes(docx_bytes);
auto pdf_bytes = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
.write(reinterpret_cast<const char*>(pdf_bytes.data()), pdf_bytes.size());
Dart
final docxBytes = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docxBytes);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());
R
docx_bytes <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc <- pdf_open_from_docx_bytes(docx_bytes)
writeBin(pdf_get_source_bytes(doc), "document.pdf")
Julia
docx_bytes = read("document.docx")
doc = open_from_docx_bytes(docx_bytes)
write("document.pdf", get_source_bytes(doc))
Zig
const docx_bytes = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx_bytes);
const pdf_bytes = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf_bytes });
Objective-C
NSData *docxBytes = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docxBytes error:&err];
NSData *pdfBytes = [doc sourceBytesWithError:&err];
[pdfBytes writeToFile:@"document.pdf" atomically:YES];
Elixir
docx_bytes = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx_bytes)
{:ok, pdf_bytes} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf_bytes)
DOCX Features Supported
- Paragraphs with alignment (left, center, right, justified)
- Headings (Heading 1–9 styles)
- Text formatting: bold, italic, underline, strikethrough
- Font sizes and colors
- Numbered and bulleted lists with nesting
- Metadata extraction (title, author from
docProps/core.xml)
Excel Spreadsheets (XLSX)
Convert spreadsheets to PDF with auto-calculated column widths and multi-sheet support. Each sheet is rendered as a separate section.
Python
from pdf_oxide import OfficeConverter
pdf = OfficeConverter.from_xlsx("data.xlsx")
pdf.save("data.pdf")
Rust
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_xlsx("data.xlsx")?;
std::fs::write("data.pdf", pdf_bytes)?;
C++
std::ifstream in("data.xlsx", std::ios::binary);
std::vector<std::uint8_t> xlsx((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_xlsx_bytes(xlsx);
auto pdf = doc.get_source_bytes();
std::ofstream("data.pdf", std::ios::binary)
.write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
Dart
final xlsx = File('data.xlsx').readAsBytesSync();
final doc = PdfDocument.openFromXlsxBytes(xlsx);
File('data.pdf').writeAsBytesSync(doc.getSourceBytes());
R
xlsx <- readBin("data.xlsx", "raw", file.info("data.xlsx")$size)
doc <- pdf_open_from_xlsx_bytes(xlsx)
writeBin(pdf_get_source_bytes(doc), "data.pdf")
Julia
xlsx = read("data.xlsx")
doc = open_from_xlsx_bytes(xlsx)
write("data.pdf", get_source_bytes(doc))
Zig
const xlsx = try std.fs.cwd().readFileAlloc("data.xlsx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromXlsxBytes(xlsx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "data.pdf", .data = pdf });
Objective-C
NSData *xlsx = [NSData dataWithContentsOfFile:@"data.xlsx"];
POXDocument *doc = [POXDocument openFromXlsxBytes:xlsx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"data.pdf" atomically:YES];
Elixir
xlsx = File.read!("data.xlsx")
{:ok, doc} = PdfOxide.open_from_xlsx_bytes(xlsx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("data.pdf", pdf)
XLSX Features Supported
- Multi-sheet rendering with sheet titles
- Cell types: strings, integers, floats, booleans, dates, errors
- Automatic column width calculation
- Automatic page breaks when content exceeds available space
PowerPoint Presentations (PPTX)
Convert presentations to PDF. Each slide becomes a page with titles and text boxes extracted.
Python
from pdf_oxide import OfficeConverter
pdf = OfficeConverter.from_pptx("slides.pptx")
pdf.save("slides.pdf")
Rust
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_pptx("slides.pptx")?;
std::fs::write("slides.pdf", pdf_bytes)?;
C++
std::ifstream in("slides.pptx", std::ios::binary);
std::vector<std::uint8_t> pptx((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_pptx_bytes(pptx);
auto pdf = doc.get_source_bytes();
std::ofstream("slides.pdf", std::ios::binary)
.write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
Dart
final pptx = File('slides.pptx').readAsBytesSync();
final doc = PdfDocument.openFromPptxBytes(pptx);
File('slides.pdf').writeAsBytesSync(doc.getSourceBytes());
R
pptx <- readBin("slides.pptx", "raw", file.info("slides.pptx")$size)
doc <- pdf_open_from_pptx_bytes(pptx)
writeBin(pdf_get_source_bytes(doc), "slides.pdf")
Julia
pptx = read("slides.pptx")
doc = open_from_pptx_bytes(pptx)
write("slides.pdf", get_source_bytes(doc))
Zig
const pptx = try std.fs.cwd().readFileAlloc("slides.pptx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromPptxBytes(pptx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "slides.pdf", .data = pdf });
Objective-C
NSData *pptx = [NSData dataWithContentsOfFile:@"slides.pptx"];
POXDocument *doc = [POXDocument openFromPptxBytes:pptx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"slides.pdf" atomically:YES];
Elixir
pptx = File.read!("slides.pptx")
{:ok, doc} = PdfOxide.open_from_pptx_bytes(pptx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("slides.pdf", pdf)
How do I convert a PDF to DOCX, PPTX, or XLSX?
The reverse direction — PDF → Office — lives on an open PDF document, not on OfficeConverter. Open a PDF with PdfDocument (Python/Rust), OpenFromBytes/Open (Go/C#), or Document.open (Swift), then call to_docx / to_pptx / to_xlsx to export back to Office formats.
PDF Oxide picks an emission strategy automatically based on page count: documents at or below the layout threshold (30 pages for DOCX/PPTX, 200 for XLSX) use a layout-preserving path that keeps each text span near its source position; larger documents fall back to a flow path that reflows content so Word/PowerPoint/Excel open it instantly. Each PDF page becomes one DOCX section, one PPTX slide, or one XLSX worksheet, and the source page dimensions and embedded fonts are preserved so a PDF → Office → PDF round-trip keeps the original layout.
PDF to Word (DOCX)
Rust
use pdf_oxide::document::PdfDocument;
let doc = PdfDocument::open("report.pdf")?;
// Write straight to disk
doc.to_docx("report.docx")?;
// Or get the bytes in memory
let docx_bytes: Vec<u8> = doc.to_docx_bytes()?;
std::fs::write("report.docx", docx_bytes)?;
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("report.pdf")
# Write straight to disk
doc.to_docx("report.docx")
# Or get the bytes in memory
docx_bytes = doc.to_docx_bytes()
with open("report.docx", "wb") as f:
f.write(docx_bytes)
Go
doc, err := pdfoxide.Open("report.pdf")
if err != nil {
log.Fatal(err)
}
defer doc.Close()
docxBytes, err := doc.ToDocxBytes()
if err != nil {
log.Fatal(err)
}
os.WriteFile("report.docx", docxBytes, 0o644)
C#
using PdfOxide.Core;
using var doc = PdfDocument.Open("report.pdf");
byte[] docxBytes = doc.ToDocxBytes();
File.WriteAllBytes("report.docx", docxBytes);
Swift
import PdfOxide
let doc = try Document.open("report.pdf")
let docxBytes = try doc.toDocx()
try Data(docxBytes).write(to: URL(fileURLWithPath: "report.docx"))
C++
#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>
auto doc = pdf_oxide::Document::open("report.pdf");
auto docx_bytes = doc.to_docx();
std::ofstream("report.docx", std::ios::binary)
.write(reinterpret_cast<const char*>(docx_bytes.data()), docx_bytes.size());
Dart
import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';
final doc = PdfDocument.open('report.pdf');
File('report.docx').writeAsBytesSync(doc.toDocx());
R
library(pdfoxide)
doc <- pdf_open("report.pdf")
writeBin(pdf_to_docx(doc), "report.docx")
Julia
using PdfOxide
doc = open_document("report.pdf")
write("report.docx", to_docx(doc))
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
var doc = try pdf_oxide.Document.open("report.pdf");
const docx_bytes = try doc.toDocx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.docx", .data = docx_bytes });
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSData *docxBytes = [doc toDocxWithError:&err];
[docxBytes writeToFile:@"report.docx" atomically:YES];
Elixir
{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, docx_bytes} = PdfOxide.to_docx(doc)
File.write!("report.docx", docx_bytes)
PDF to PowerPoint (PPTX)
Rust
use pdf_oxide::document::PdfDocument;
let doc = PdfDocument::open("deck.pdf")?;
doc.to_pptx("deck.pptx")?; // to disk
let pptx_bytes = doc.to_pptx_bytes()?; // or in memory
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("deck.pdf")
doc.to_pptx("deck.pptx") # to disk
pptx_bytes = doc.to_pptx_bytes() # or in memory
Go
doc, _ := pdfoxide.Open("deck.pdf")
defer doc.Close()
pptxBytes, err := doc.ToPptxBytes()
if err != nil {
log.Fatal(err)
}
os.WriteFile("deck.pptx", pptxBytes, 0o644)
C#
using var doc = PdfDocument.Open("deck.pdf");
File.WriteAllBytes("deck.pptx", doc.ToPptxBytes());
Swift
let doc = try Document.open("deck.pdf")
let pptxBytes = try doc.toPptx()
try Data(pptxBytes).write(to: URL(fileURLWithPath: "deck.pptx"))
C++
auto doc = pdf_oxide::Document::open("deck.pdf");
auto pptx_bytes = doc.to_pptx();
std::ofstream("deck.pptx", std::ios::binary)
.write(reinterpret_cast<const char*>(pptx_bytes.data()), pptx_bytes.size());
Dart
final doc = PdfDocument.open('deck.pdf');
File('deck.pptx').writeAsBytesSync(doc.toPptx());
R
doc <- pdf_open("deck.pdf")
writeBin(pdf_to_pptx(doc), "deck.pptx")
Julia
doc = open_document("deck.pdf")
write("deck.pptx", to_pptx(doc))
Zig
var doc = try pdf_oxide.Document.open("deck.pdf");
const pptx_bytes = try doc.toPptx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "deck.pptx", .data = pptx_bytes });
Objective-C
POXDocument *doc = [POXDocument openPath:@"deck.pdf" error:&err];
NSData *pptxBytes = [doc toPptxWithError:&err];
[pptxBytes writeToFile:@"deck.pptx" atomically:YES];
Elixir
{:ok, doc} = PdfOxide.open("deck.pdf")
{:ok, pptx_bytes} = PdfOxide.to_pptx(doc)
File.write!("deck.pptx", pptx_bytes)
PDF to Excel (XLSX)
Rust
use pdf_oxide::document::PdfDocument;
let doc = PdfDocument::open("table.pdf")?;
doc.to_xlsx("table.xlsx")?; // to disk
let xlsx_bytes = doc.to_xlsx_bytes()?; // or in memory
Python
from pdf_oxide import PdfDocument
doc = PdfDocument("table.pdf")
doc.to_xlsx("table.xlsx") # to disk
xlsx_bytes = doc.to_xlsx_bytes() # or in memory
Go
doc, _ := pdfoxide.Open("table.pdf")
defer doc.Close()
xlsxBytes, err := doc.ToXlsxBytes()
if err != nil {
log.Fatal(err)
}
os.WriteFile("table.xlsx", xlsxBytes, 0o644)
C#
using var doc = PdfDocument.Open("table.pdf");
File.WriteAllBytes("table.xlsx", doc.ToXlsxBytes());
Swift
let doc = try Document.open("table.pdf")
let xlsxBytes = try doc.toXlsx()
try Data(xlsxBytes).write(to: URL(fileURLWithPath: "table.xlsx"))
C++
auto doc = pdf_oxide::Document::open("table.pdf");
auto xlsx_bytes = doc.to_xlsx();
std::ofstream("table.xlsx", std::ios::binary)
.write(reinterpret_cast<const char*>(xlsx_bytes.data()), xlsx_bytes.size());
Dart
final doc = PdfDocument.open('table.pdf');
File('table.xlsx').writeAsBytesSync(doc.toXlsx());
R
doc <- pdf_open("table.pdf")
writeBin(pdf_to_xlsx(doc), "table.xlsx")
Julia
doc = open_document("table.pdf")
write("table.xlsx", to_xlsx(doc))
Zig
var doc = try pdf_oxide.Document.open("table.pdf");
const xlsx_bytes = try doc.toXlsx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "table.xlsx", .data = xlsx_bytes });
Objective-C
POXDocument *doc = [POXDocument openPath:@"table.pdf" error:&err];
NSData *xlsxBytes = [doc toXlsxWithError:&err];
[xlsxBytes writeToFile:@"table.xlsx" atomically:YES];
Elixir
{:ok, doc} = PdfOxide.open("table.pdf")
{:ok, xlsx_bytes} = PdfOxide.to_xlsx(doc)
File.write!("table.xlsx", xlsx_bytes)
Python note:
to_docx/to_pptx/to_xlsxare exposed onPdfDocument(the extraction/inspection class), not on theOfficeConverter/PdfDocument("file.pdf")to open the source PDF.
How do I open an Office file directly as a PDF document?
The native bindings (Go, C#, Swift, and the C ABI) expose open_from_*_bytes constructors that convert DOCX/PPTX/XLSX bytes and hand back an already-open PdfDocument — convenient when you want to immediately extract text, render, or re-export rather than save the intermediate PDF. Each constructor internally runs OfficeConverter and opens the resulting PDF in one call.
Go
data, err := os.ReadFile("contract.docx")
if err != nil {
log.Fatal(err)
}
doc, err := pdfoxide.OpenFromDocxBytes(data)
if err != nil {
log.Fatal(err)
}
defer doc.Close()
// Now work with it as a normal PDF document
text, _ := doc.ExtractText(0)
fmt.Println(text)
C#
using PdfOxide.Core;
byte[] data = File.ReadAllBytes("contract.docx");
using var doc = PdfDocument.OpenFromDocxBytes(data);
// Use it like any other open PDF — extract, render, or re-export
byte[] pdfBytes = doc.ToDocxBytes(); // round-trip if you like
Swift
import PdfOxide
import Foundation
let data = try Data(contentsOf: URL(fileURLWithPath: "contract.docx"))
let doc = try Document.openFromDocxBytes([UInt8](data))
let pageCount = try doc.pageCount()
print("Converted DOCX has \(pageCount) page(s)")
C++
#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>
std::ifstream in("contract.docx", std::ios::binary);
std::vector<std::uint8_t> data((std::istreambuf_iterator<char>(in)), {});
auto doc = pdf_oxide::Document::open_from_docx_bytes(data);
// Now work with it as a normal PDF document
auto text = doc.extract_text(0);
Dart
import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';
final data = File('contract.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(data);
final text = doc.extractText(0);
R
library(pdfoxide)
data <- readBin("contract.docx", "raw", file.info("contract.docx")$size)
doc <- pdf_open_from_docx_bytes(data)
text <- pdf_extract_text(doc, 0)
Julia
using PdfOxide
data = read("contract.docx")
doc = open_from_docx_bytes(data)
text = extract_text(doc, 0)
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
const data = try std.fs.cwd().readFileAlloc("contract.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(data);
const text = try doc.extractText(a, 0);
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
NSData *data = [NSData dataWithContentsOfFile:@"contract.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:data error:&err];
NSString *text = [doc extractText:0 error:&err];
Elixir
data = File.read!("contract.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(data)
{:ok, text} = PdfOxide.extract_text(doc, 0)
PPTX and XLSX use the matching constructors:
| Source format | Go | C# | Swift |
|---|---|---|---|
| DOCX | OpenFromDocxBytes(data) |
PdfDocument.OpenFromDocxBytes(data) |
Document.openFromDocxBytes(bytes) |
| PPTX | OpenFromPptxBytes(data) |
PdfDocument.OpenFromPptxBytes(data) |
Document.openFromPptxBytes(bytes) |
| XLSX | OpenFromXlsxBytes(data) |
PdfDocument.OpenFromXlsxBytes(data) |
Document.openFromXlsxBytes(bytes) |
Rust / Python: there is no
open_from_docx_bytesconstructor on the corePdfDocument. In Rust, convert first withOfficeConverter::new().convert_docx_bytes(&data)?and thenPdfDocument::from_bytes(pdf_bytes)?. In Python, useOfficeConverter.from_docx_bytes(data)(documented above), which returns a
use pdf_oxide::converters::office::OfficeConverter;
use pdf_oxide::document::PdfDocument;
let data = std::fs::read("contract.docx")?;
let pdf_bytes = OfficeConverter::new().convert_docx_bytes(&data)?;
let doc = PdfDocument::from_bytes(pdf_bytes)?;
println!("{} pages", doc.page_count()?);
Configuration (Rust)
Customize page size, margins, and fonts using OfficeConfig:
use pdf_oxide::converters::office::{OfficeConverter, OfficeConfig};
let config = OfficeConfig::a4(); // A4 page size
let converter = OfficeConverter::with_config(config);
let pdf_bytes = converter.convert_docx("document.docx")?;
OfficeConfig Fields
| Field | Type | Default | Description |
|---|---|---|---|
page_size |
PageSize |
Letter | Page dimensions |
margins |
Margins |
1 inch all sides | Page margins in points (72pt = 1 inch) |
embed_fonts |
bool |
false |
Whether to embed fonts |
default_font |
String |
"Helvetica" |
Fallback font |
default_font_size |
f32 |
11.0 |
Default text size in points |
line_height |
f32 |
1.2 |
Line height multiplier |
include_images |
bool |
true |
Include embedded images |
Page Size Presets
let config = OfficeConfig::letter(); // 8.5 × 11 inches (default)
let config = OfficeConfig::a4(); // 210 × 297 mm
Custom Margins
use pdf_oxide::converters::office::Margins;
let mut config = OfficeConfig::letter();
config.margins = Margins::uniform(36.0); // 0.5 inch margins
config.margins = Margins::none(); // No margins
Batch Conversion
Python
from pdf_oxide import OfficeConverter
from pathlib import Path
office_dir = Path("documents/")
output_dir = Path("pdfs/")
output_dir.mkdir(exist_ok=True)
extensions = {".docx", ".xlsx", ".pptx"}
for doc_path in office_dir.iterdir():
if doc_path.suffix.lower() in extensions:
pdf = OfficeConverter.convert(str(doc_path))
pdf.save(str(output_dir / doc_path.with_suffix(".pdf").name))
print(f"Converted: {doc_path.name}")
Rust
use pdf_oxide::converters::office::OfficeConverter;
use std::fs;
let converter = OfficeConverter::new();
for entry in fs::read_dir("documents/")? {
let path = entry?.path();
match path.extension().and_then(|e| e.to_str()) {
Some("docx" | "xlsx" | "pptx") => {
let pdf_bytes = converter.convert(&path)?;
let out = format!("pdfs/{}.pdf", path.file_stem().unwrap().to_str().unwrap());
fs::write(&out, pdf_bytes)?;
println!("Converted: {}", path.display());
}
_ => {}
}
}
C++
#include <pdf_oxide/pdf_oxide.hpp>
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;
for (const auto& entry : fs::directory_iterator("documents/")) {
auto path = entry.path();
auto ext = path.extension().string();
if (ext != ".docx" && ext != ".xlsx" && ext != ".pptx") continue;
std::ifstream in(path, std::ios::binary);
std::vector<std::uint8_t> bytes((std::istreambuf_iterator<char>(in)), {});
auto doc =
ext == ".docx" ? pdf_oxide::Document::open_from_docx_bytes(bytes)
: ext == ".xlsx" ? pdf_oxide::Document::open_from_xlsx_bytes(bytes)
: pdf_oxide::Document::open_from_pptx_bytes(bytes);
auto pdf = doc.get_source_bytes();
auto out = "pdfs/" + path.stem().string() + ".pdf";
std::ofstream(out, std::ios::binary)
.write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
}
Dart
import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';
Directory('pdfs').createSync(recursive: true);
for (final entry in Directory('documents').listSync()) {
if (entry is! File) continue;
final ext = entry.path.split('.').last.toLowerCase();
final bytes = entry.readAsBytesSync();
final doc = switch (ext) {
'docx' => PdfDocument.openFromDocxBytes(bytes),
'xlsx' => PdfDocument.openFromXlsxBytes(bytes),
'pptx' => PdfDocument.openFromPptxBytes(bytes),
_ => null,
};
if (doc == null) continue;
final name = entry.uri.pathSegments.last.replaceAll(RegExp(r'\.\w+$'), '');
File('pdfs/$name.pdf').writeAsBytesSync(doc.getSourceBytes());
}
R
library(pdfoxide)
dir.create("pdfs", showWarnings = FALSE)
for (path in list.files("documents", full.names = TRUE)) {
ext <- tolower(tools::file_ext(path))
bytes <- readBin(path, "raw", file.info(path)$size)
doc <- switch(ext,
docx = pdf_open_from_docx_bytes(bytes),
xlsx = pdf_open_from_xlsx_bytes(bytes),
pptx = pdf_open_from_pptx_bytes(bytes),
next)
out <- file.path("pdfs", paste0(tools::file_path_sans_ext(basename(path)), ".pdf"))
writeBin(pdf_get_source_bytes(doc), out)
}
Julia
using PdfOxide
mkpath("pdfs")
for path in readdir("documents"; join = true)
ext = lowercase(splitext(path)[2])
bytes = read(path)
doc = if ext == ".docx"
open_from_docx_bytes(bytes)
elseif ext == ".xlsx"
open_from_xlsx_bytes(bytes)
elseif ext == ".pptx"
open_from_pptx_bytes(bytes)
else
continue
end
name = first(splitext(basename(path)))
write(joinpath("pdfs", name * ".pdf"), get_source_bytes(doc))
end
Zig
const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;
try std.fs.cwd().makePath("pdfs");
var dir = try std.fs.cwd().openDir("documents", .{ .iterate = true });
var it = dir.iterate();
while (try it.next()) |entry| {
const bytes = try dir.readFileAlloc(entry.name, a, .unlimited);
var doc = if (std.mem.endsWith(u8, entry.name, ".docx"))
try pdf_oxide.Document.openFromDocxBytes(bytes)
else if (std.mem.endsWith(u8, entry.name, ".xlsx"))
try pdf_oxide.Document.openFromXlsxBytes(bytes)
else if (std.mem.endsWith(u8, entry.name, ".pptx"))
try pdf_oxide.Document.openFromPptxBytes(bytes)
else
continue;
const pdf = try doc.sourceBytes(a);
const stem = entry.name[0 .. std.mem.lastIndexOfScalar(u8, entry.name, '.').?];
const out = try std.fmt.allocPrint(a, "pdfs/{s}.pdf", .{stem});
try std.fs.cwd().writeFile(.{ .sub_path = out, .data = pdf });
}
Objective-C
#import "POXPdfOxide.h"
NSError *err = nil;
NSFileManager *fm = [NSFileManager defaultManager];
[fm createDirectoryAtPath:@"pdfs" withIntermediateDirectories:YES attributes:nil error:&err];
for (NSString *name in [fm contentsOfDirectoryAtPath:@"documents" error:&err]) {
NSString *path = [@"documents" stringByAppendingPathComponent:name];
NSData *bytes = [NSData dataWithContentsOfFile:path];
NSString *ext = name.pathExtension.lowercaseString;
POXDocument *doc;
if ([ext isEqualToString:@"docx"]) doc = [POXDocument openFromDocxBytes:bytes error:&err];
else if ([ext isEqualToString:@"xlsx"]) doc = [POXDocument openFromXlsxBytes:bytes error:&err];
else if ([ext isEqualToString:@"pptx"]) doc = [POXDocument openFromPptxBytes:bytes error:&err];
else continue;
NSData *pdf = [doc sourceBytesWithError:&err];
NSString *out = [@"pdfs" stringByAppendingPathComponent:
[name.stringByDeletingPathExtension stringByAppendingPathExtension:@"pdf"]];
[pdf writeToFile:out atomically:YES];
}
Elixir
File.mkdir_p!("pdfs")
for name <- File.ls!("documents") do
bytes = File.read!(Path.join("documents", name))
result =
case Path.extname(name) |> String.downcase() do
".docx" -> PdfOxide.open_from_docx_bytes(bytes)
".xlsx" -> PdfOxide.open_from_xlsx_bytes(bytes)
".pptx" -> PdfOxide.open_from_pptx_bytes(bytes)
_ -> :skip
end
with {:ok, doc} <- result,
{:ok, pdf} <- PdfOxide.source_bytes(doc) do
out = Path.join("pdfs", Path.rootname(name) <> ".pdf")
File.write!(out, pdf)
end
end
API Reference
<a id=“office-pdf-officeconverter”></a>
Python — OfficeConverter
| Method | Returns | Description |
|---|---|---|
OfficeConverter.convert(path) |
Pdf |
Auto-detect format and convert |
OfficeConverter.from_docx(path) |
Pdf |
Convert DOCX file |
OfficeConverter.from_docx_bytes(data) |
Pdf |
Convert DOCX from bytes |
OfficeConverter.from_xlsx(path) |
Pdf |
Convert XLSX file |
OfficeConverter.from_xlsx_bytes(data) |
Pdf |
Convert XLSX from bytes |
OfficeConverter.from_pptx(path) |
Pdf |
Convert PPTX file |
OfficeConverter.from_pptx_bytes(data) |
Pdf |
Convert PPTX from bytes |
All methods return a Pdf object. Call pdf.save("output.pdf") or pdf.to_bytes() to get the result.
Rust — OfficeConverter
| Method | Returns | Description |
|---|---|---|
OfficeConverter::new() |
OfficeConverter |
Create with default config |
OfficeConverter::with_config(config) |
OfficeConverter |
Create with custom config |
convert(path) |
Result<Vec<u8>> |
Auto-detect format and convert |
convert_docx(path) |
Result<Vec<u8>> |
Convert DOCX file |
convert_docx_bytes(bytes) |
Result<Vec<u8>> |
Convert DOCX from bytes |
convert_xlsx(path) |
Result<Vec<u8>> |
Convert XLSX file |
convert_xlsx_bytes(bytes) |
Result<Vec<u8>> |
Convert XLSX from bytes |
convert_pptx(path) |
Result<Vec<u8>> |
Convert PPTX file |
convert_pptx_bytes(bytes) |
Result<Vec<u8>> |
Convert PPTX from bytes |
PDF → Office — to_docx / to_pptx / to_xlsx
Exported from an open PDF document. Exposed in Rust, Python, Go, C#, and Swift.
| Language | Method | Returns | Description |
|---|---|---|---|
| Rust | PdfDocument::to_docx(path) |
Result<()> |
Export PDF to a DOCX file on disk |
| Rust | PdfDocument::to_docx_bytes() |
Result<Vec<u8>> |
Export PDF to DOCX bytes |
| Rust | PdfDocument::to_pptx(path) / to_pptx_bytes() |
Result<()> / Result<Vec<u8>> |
Export PDF to PPTX |
| Rust | PdfDocument::to_xlsx(path) / to_xlsx_bytes() |
Result<()> / Result<Vec<u8>> |
Export PDF to XLSX |
| Python | PdfDocument.to_docx(path) / to_docx_bytes() |
None / bytes |
Export PDF to DOCX |
| Python | PdfDocument.to_pptx(path) / to_pptx_bytes() |
None / bytes |
Export PDF to PPTX |
| Python | PdfDocument.to_xlsx(path) / to_xlsx_bytes() |
None / bytes |
Export PDF to XLSX |
| Go | (*PdfDocument).ToDocxBytes() |
([]byte, error) |
Export PDF to DOCX bytes |
| Go | (*PdfDocument).ToPptxBytes() |
([]byte, error) |
Export PDF to PPTX bytes |
| Go | (*PdfDocument).ToXlsxBytes() |
([]byte, error) |
Export PDF to XLSX bytes |
| C# | PdfDocument.ToDocxBytes() |
byte[] |
Export PDF to DOCX bytes |
| C# | PdfDocument.ToPptxBytes() |
byte[] |
Export PDF to PPTX bytes |
| C# | PdfDocument.ToXlsxBytes() |
byte[] |
Export PDF to XLSX bytes |
| Swift | Document.toDocx() |
[UInt8] |
Export PDF to DOCX bytes |
| Swift | Document.toPptx() |
[UInt8] |
Export PDF to PPTX bytes |
| Swift | Document.toXlsx() |
[UInt8] |
Export PDF to XLSX bytes |
Office → PDF document — open_from_*_bytes
Native-binding convenience constructors that convert Office bytes and return an open PDF document. Exposed in Go, C#, Swift, and the C ABI. Not available on the Rust core PdfDocument or in Python — use OfficeConverter there (see the table above).
| Language | Constructor | Returns | Description |
|---|---|---|---|
| Go | OpenFromDocxBytes(data) |
(*PdfDocument, error) |
Open a PDF document from DOCX bytes |
| Go | OpenFromPptxBytes(data) |
(*PdfDocument, error) |
Open a PDF document from PPTX bytes |
| Go | OpenFromXlsxBytes(data) |
(*PdfDocument, error) |
Open a PDF document from XLSX bytes |
| C# | PdfDocument.OpenFromDocxBytes(data) |
PdfDocument |
Open a PDF document from DOCX bytes |
| C# | PdfDocument.OpenFromPptxBytes(data) |
PdfDocument |
Open a PDF document from PPTX bytes |
| C# | PdfDocument.OpenFromXlsxBytes(data) |
PdfDocument |
Open a PDF document from XLSX bytes |
| Swift | Document.openFromDocxBytes(bytes) |
Document |
Open a PDF document from DOCX bytes |
| Swift | Document.openFromPptxBytes(bytes) |
Document |
Open a PDF document from PPTX bytes |
| Swift | Document.openFromXlsxBytes(bytes) |
Document |
Open a PDF document from XLSX bytes |
| C ABI | pdf_document_open_from_docx_bytes(data, len, error_code) |
PdfDocument * |
Open a PDF document from DOCX bytes |
| C ABI | pdf_document_open_from_pptx_bytes(data, len, error_code) |
PdfDocument * |
Open a PDF document from PPTX bytes |
| C ABI | pdf_document_open_from_xlsx_bytes(data, len, error_code) |
PdfDocument * |
Open a PDF document from XLSX bytes |
FAQ
Does converting a PDF to DOCX keep the layout?
Yes, within limits. For documents at or below the layout threshold (30 pages for DOCX/PPTX, 200 for XLSX), to_docx_bytes / to_pptx_bytes / to_xlsx_bytes use a layout-preserving path that emits each PDF text span as a positioned, editable element and embeds the source PDF’s fonts, so a PDF → Office → PDF round-trip keeps the original page dimensions. Larger documents fall back to a flow path that reflows text into real paragraphs so Word/PowerPoint/Excel open them instantly.
Can I convert a PDF back to PowerPoint or Excel, not just Word?
Yes. to_pptx/to_pptx_bytes map each PDF page to one slide sized to the source MediaBox, and to_xlsx/to_xlsx_bytes map each page to one worksheet. Both are available in Rust, Python, Go, C#, and Swift.
Why is there no open_from_docx_bytes in Python?
Python exposes the Office → PDF direction through the higher-level OfficeConverter class instead (OfficeConverter.from_docx_bytes(data) returns a Pdf). The open_from_*_bytes constructors are convenience wrappers added at the native FFI layer (Go, C#, Swift, C ABI), where there is no separate converter class.
Do I need Microsoft Office or LibreOffice installed?
No. PDF Oxide reads and writes the OOXML (DOCX/XLSX/PPTX) format directly in pure Rust. There are no external process calls, COM automation, or headless Office instances — conversion works the same on Linux, macOS, and Windows.
Related Pages
- Create from Markdown — Convert Markdown text to PDF
- Create from HTML — Convert HTML to PDF
- Create from Images — Convert images to PDF
- Batch Processing — Parallel processing patterns