Skip to content

Convert PDF to and from Office Documents

Convert Microsoft Office documents (Word, Excel, PowerPoint) to PDF — and convert a PDF back to DOCX, PPTX, or XLSX — without Microsoft Office or LibreOffice installed. PDF Oxide parses the OOXML format directly and produces PDF output, and renders PDF pages back into editable Office documents.

Conversion runs in two directions:

  • Office → PDF — the OfficeConverter class (and the open_from_*_bytes constructors) parse DOCX/XLSX/PPTX and produce a PDF.
  • PDF → Office — the to_docx / to_pptx / to_xlsx methods on an open document export back to Office formats.

Quick Example

Python

from pdf_oxide import OfficeConverter

# Auto-detect format from extension
pdf = OfficeConverter.convert("report.docx")
pdf.save("report.pdf")

Rust

use pdf_oxide::converters::office::OfficeConverter;

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert("report.docx")?;
std::fs::write("report.pdf", pdf_bytes)?;

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("report.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("report.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final docx = File('report.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('report.pdf').writeAsBytesSync(doc.getSourceBytes());

R

library(pdfoxide)

docx <- readBin("report.docx", "raw", file.info("report.docx")$size)
doc  <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "report.pdf")

Julia

using PdfOxide

docx = read("report.docx")
doc  = open_from_docx_bytes(docx)
write("report.pdf", get_source_bytes(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const docx = try std.fs.cwd().readFileAlloc("report.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.pdf", .data = pdf });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *docx = [NSData dataWithContentsOfFile:@"report.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"report.pdf" atomically:YES];

Elixir

docx = File.read!("report.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("report.pdf", pdf)

Supported Formats

Format Extension Description
DOCX .docx Word documents — paragraphs, headings, lists, text formatting
XLSX .xlsx, .xls Excel spreadsheets — multi-sheet, auto-sized columns, cell types
PPTX .pptx PowerPoint presentations — slides, titles, text boxes

Word Documents (DOCX)

Convert Word documents preserving headings, paragraphs, lists, and text formatting (bold, italic, underline, colors, font sizes).

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_docx("document.docx")
pdf.save("document.pdf")

Rust

use pdf_oxide::converters::office::OfficeConverter;

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx("document.docx")?;
std::fs::write("document.pdf", pdf_bytes)?;

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx);
auto pdf = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final docx = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docx);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());

R

library(pdfoxide)

docx <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc  <- pdf_open_from_docx_bytes(docx)
writeBin(pdf_get_source_bytes(doc), "document.pdf")

Julia

using PdfOxide

docx = read("document.docx")
doc  = open_from_docx_bytes(docx)
write("document.pdf", get_source_bytes(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const docx = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *docx = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"document.pdf" atomically:YES];

Elixir

docx = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf)

From Bytes

Python

from pdf_oxide import OfficeConverter

with open("document.docx", "rb") as f:
    pdf = OfficeConverter.from_docx_bytes(f.read())
pdf.save("document.pdf")

Rust

let docx_bytes = std::fs::read("document.docx")?;
let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_docx_bytes(&docx_bytes)?;
std::fs::write("document.pdf", pdf_bytes)?;

C++

std::ifstream in("document.docx", std::ios::binary);
std::vector<std::uint8_t> docx_bytes((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(docx_bytes);
auto pdf_bytes = doc.get_source_bytes();
std::ofstream("document.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf_bytes.data()), pdf_bytes.size());

Dart

final docxBytes = File('document.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(docxBytes);
File('document.pdf').writeAsBytesSync(doc.getSourceBytes());

R

docx_bytes <- readBin("document.docx", "raw", file.info("document.docx")$size)
doc <- pdf_open_from_docx_bytes(docx_bytes)
writeBin(pdf_get_source_bytes(doc), "document.pdf")

Julia

docx_bytes = read("document.docx")
doc = open_from_docx_bytes(docx_bytes)
write("document.pdf", get_source_bytes(doc))

Zig

const docx_bytes = try std.fs.cwd().readFileAlloc("document.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(docx_bytes);
const pdf_bytes = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "document.pdf", .data = pdf_bytes });

Objective-C

NSData *docxBytes = [NSData dataWithContentsOfFile:@"document.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:docxBytes error:&err];
NSData *pdfBytes = [doc sourceBytesWithError:&err];
[pdfBytes writeToFile:@"document.pdf" atomically:YES];

Elixir

docx_bytes = File.read!("document.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(docx_bytes)
{:ok, pdf_bytes} = PdfOxide.source_bytes(doc)
File.write!("document.pdf", pdf_bytes)

DOCX Features Supported

  • Paragraphs with alignment (left, center, right, justified)
  • Headings (Heading 1–9 styles)
  • Text formatting: bold, italic, underline, strikethrough
  • Font sizes and colors
  • Numbered and bulleted lists with nesting
  • Metadata extraction (title, author from docProps/core.xml)

Excel Spreadsheets (XLSX)

Convert spreadsheets to PDF with auto-calculated column widths and multi-sheet support. Each sheet is rendered as a separate section.

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_xlsx("data.xlsx")
pdf.save("data.pdf")

Rust

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_xlsx("data.xlsx")?;
std::fs::write("data.pdf", pdf_bytes)?;

C++

std::ifstream in("data.xlsx", std::ios::binary);
std::vector<std::uint8_t> xlsx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_xlsx_bytes(xlsx);
auto pdf = doc.get_source_bytes();
std::ofstream("data.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

final xlsx = File('data.xlsx').readAsBytesSync();
final doc = PdfDocument.openFromXlsxBytes(xlsx);
File('data.pdf').writeAsBytesSync(doc.getSourceBytes());

R

xlsx <- readBin("data.xlsx", "raw", file.info("data.xlsx")$size)
doc  <- pdf_open_from_xlsx_bytes(xlsx)
writeBin(pdf_get_source_bytes(doc), "data.pdf")

Julia

xlsx = read("data.xlsx")
doc  = open_from_xlsx_bytes(xlsx)
write("data.pdf", get_source_bytes(doc))

Zig

const xlsx = try std.fs.cwd().readFileAlloc("data.xlsx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromXlsxBytes(xlsx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "data.pdf", .data = pdf });

Objective-C

NSData *xlsx = [NSData dataWithContentsOfFile:@"data.xlsx"];
POXDocument *doc = [POXDocument openFromXlsxBytes:xlsx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"data.pdf" atomically:YES];

Elixir

xlsx = File.read!("data.xlsx")
{:ok, doc} = PdfOxide.open_from_xlsx_bytes(xlsx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("data.pdf", pdf)

XLSX Features Supported

  • Multi-sheet rendering with sheet titles
  • Cell types: strings, integers, floats, booleans, dates, errors
  • Automatic column width calculation
  • Automatic page breaks when content exceeds available space

PowerPoint Presentations (PPTX)

Convert presentations to PDF. Each slide becomes a page with titles and text boxes extracted.

Python

from pdf_oxide import OfficeConverter

pdf = OfficeConverter.from_pptx("slides.pptx")
pdf.save("slides.pdf")

Rust

let converter = OfficeConverter::new();
let pdf_bytes = converter.convert_pptx("slides.pptx")?;
std::fs::write("slides.pdf", pdf_bytes)?;

C++

std::ifstream in("slides.pptx", std::ios::binary);
std::vector<std::uint8_t> pptx((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_pptx_bytes(pptx);
auto pdf = doc.get_source_bytes();
std::ofstream("slides.pdf", std::ios::binary)
    .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());

Dart

final pptx = File('slides.pptx').readAsBytesSync();
final doc = PdfDocument.openFromPptxBytes(pptx);
File('slides.pdf').writeAsBytesSync(doc.getSourceBytes());

R

pptx <- readBin("slides.pptx", "raw", file.info("slides.pptx")$size)
doc  <- pdf_open_from_pptx_bytes(pptx)
writeBin(pdf_get_source_bytes(doc), "slides.pdf")

Julia

pptx = read("slides.pptx")
doc  = open_from_pptx_bytes(pptx)
write("slides.pdf", get_source_bytes(doc))

Zig

const pptx = try std.fs.cwd().readFileAlloc("slides.pptx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromPptxBytes(pptx);
const pdf = try doc.sourceBytes(a);
try std.fs.cwd().writeFile(.{ .sub_path = "slides.pdf", .data = pdf });

Objective-C

NSData *pptx = [NSData dataWithContentsOfFile:@"slides.pptx"];
POXDocument *doc = [POXDocument openFromPptxBytes:pptx error:&err];
NSData *pdf = [doc sourceBytesWithError:&err];
[pdf writeToFile:@"slides.pdf" atomically:YES];

Elixir

pptx = File.read!("slides.pptx")
{:ok, doc} = PdfOxide.open_from_pptx_bytes(pptx)
{:ok, pdf} = PdfOxide.source_bytes(doc)
File.write!("slides.pdf", pdf)

How do I convert a PDF to DOCX, PPTX, or XLSX?

The reverse direction — PDF → Office — lives on an open PDF document, not on OfficeConverter. Open a PDF with PdfDocument (Python/Rust), OpenFromBytes/Open (Go/C#), or Document.open (Swift), then call to_docx / to_pptx / to_xlsx to export back to Office formats.

PDF Oxide picks an emission strategy automatically based on page count: documents at or below the layout threshold (30 pages for DOCX/PPTX, 200 for XLSX) use a layout-preserving path that keeps each text span near its source position; larger documents fall back to a flow path that reflows content so Word/PowerPoint/Excel open it instantly. Each PDF page becomes one DOCX section, one PPTX slide, or one XLSX worksheet, and the source page dimensions and embedded fonts are preserved so a PDF → Office → PDF round-trip keeps the original layout.

PDF to Word (DOCX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("report.pdf")?;

// Write straight to disk
doc.to_docx("report.docx")?;

// Or get the bytes in memory
let docx_bytes: Vec<u8> = doc.to_docx_bytes()?;
std::fs::write("report.docx", docx_bytes)?;

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("report.pdf")

# Write straight to disk
doc.to_docx("report.docx")

# Or get the bytes in memory
docx_bytes = doc.to_docx_bytes()
with open("report.docx", "wb") as f:
    f.write(docx_bytes)

Go

doc, err := pdfoxide.Open("report.pdf")
if err != nil {
    log.Fatal(err)
}
defer doc.Close()

docxBytes, err := doc.ToDocxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("report.docx", docxBytes, 0o644)

C#

using PdfOxide.Core;

using var doc = PdfDocument.Open("report.pdf");
byte[] docxBytes = doc.ToDocxBytes();
File.WriteAllBytes("report.docx", docxBytes);

Swift

import PdfOxide

let doc = try Document.open("report.pdf")
let docxBytes = try doc.toDocx()
try Data(docxBytes).write(to: URL(fileURLWithPath: "report.docx"))

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

auto doc = pdf_oxide::Document::open("report.pdf");
auto docx_bytes = doc.to_docx();
std::ofstream("report.docx", std::ios::binary)
    .write(reinterpret_cast<const char*>(docx_bytes.data()), docx_bytes.size());

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final doc = PdfDocument.open('report.pdf');
File('report.docx').writeAsBytesSync(doc.toDocx());

R

library(pdfoxide)

doc <- pdf_open("report.pdf")
writeBin(pdf_to_docx(doc), "report.docx")

Julia

using PdfOxide

doc = open_document("report.pdf")
write("report.docx", to_docx(doc))

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

var doc = try pdf_oxide.Document.open("report.pdf");
const docx_bytes = try doc.toDocx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "report.docx", .data = docx_bytes });

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

POXDocument *doc = [POXDocument openPath:@"report.pdf" error:&err];
NSData *docxBytes = [doc toDocxWithError:&err];
[docxBytes writeToFile:@"report.docx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("report.pdf")
{:ok, docx_bytes} = PdfOxide.to_docx(doc)
File.write!("report.docx", docx_bytes)

PDF to PowerPoint (PPTX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("deck.pdf")?;
doc.to_pptx("deck.pptx")?;            // to disk
let pptx_bytes = doc.to_pptx_bytes()?; // or in memory

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("deck.pdf")
doc.to_pptx("deck.pptx")           # to disk
pptx_bytes = doc.to_pptx_bytes()   # or in memory

Go

doc, _ := pdfoxide.Open("deck.pdf")
defer doc.Close()
pptxBytes, err := doc.ToPptxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("deck.pptx", pptxBytes, 0o644)

C#

using var doc = PdfDocument.Open("deck.pdf");
File.WriteAllBytes("deck.pptx", doc.ToPptxBytes());

Swift

let doc = try Document.open("deck.pdf")
let pptxBytes = try doc.toPptx()
try Data(pptxBytes).write(to: URL(fileURLWithPath: "deck.pptx"))

C++

auto doc = pdf_oxide::Document::open("deck.pdf");
auto pptx_bytes = doc.to_pptx();
std::ofstream("deck.pptx", std::ios::binary)
    .write(reinterpret_cast<const char*>(pptx_bytes.data()), pptx_bytes.size());

Dart

final doc = PdfDocument.open('deck.pdf');
File('deck.pptx').writeAsBytesSync(doc.toPptx());

R

doc <- pdf_open("deck.pdf")
writeBin(pdf_to_pptx(doc), "deck.pptx")

Julia

doc = open_document("deck.pdf")
write("deck.pptx", to_pptx(doc))

Zig

var doc = try pdf_oxide.Document.open("deck.pdf");
const pptx_bytes = try doc.toPptx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "deck.pptx", .data = pptx_bytes });

Objective-C

POXDocument *doc = [POXDocument openPath:@"deck.pdf" error:&err];
NSData *pptxBytes = [doc toPptxWithError:&err];
[pptxBytes writeToFile:@"deck.pptx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("deck.pdf")
{:ok, pptx_bytes} = PdfOxide.to_pptx(doc)
File.write!("deck.pptx", pptx_bytes)

PDF to Excel (XLSX)

Rust

use pdf_oxide::document::PdfDocument;

let doc = PdfDocument::open("table.pdf")?;
doc.to_xlsx("table.xlsx")?;            // to disk
let xlsx_bytes = doc.to_xlsx_bytes()?; // or in memory

Python

from pdf_oxide import PdfDocument

doc = PdfDocument("table.pdf")
doc.to_xlsx("table.xlsx")          # to disk
xlsx_bytes = doc.to_xlsx_bytes()   # or in memory

Go

doc, _ := pdfoxide.Open("table.pdf")
defer doc.Close()
xlsxBytes, err := doc.ToXlsxBytes()
if err != nil {
    log.Fatal(err)
}
os.WriteFile("table.xlsx", xlsxBytes, 0o644)

C#

using var doc = PdfDocument.Open("table.pdf");
File.WriteAllBytes("table.xlsx", doc.ToXlsxBytes());

Swift

let doc = try Document.open("table.pdf")
let xlsxBytes = try doc.toXlsx()
try Data(xlsxBytes).write(to: URL(fileURLWithPath: "table.xlsx"))

C++

auto doc = pdf_oxide::Document::open("table.pdf");
auto xlsx_bytes = doc.to_xlsx();
std::ofstream("table.xlsx", std::ios::binary)
    .write(reinterpret_cast<const char*>(xlsx_bytes.data()), xlsx_bytes.size());

Dart

final doc = PdfDocument.open('table.pdf');
File('table.xlsx').writeAsBytesSync(doc.toXlsx());

R

doc <- pdf_open("table.pdf")
writeBin(pdf_to_xlsx(doc), "table.xlsx")

Julia

doc = open_document("table.pdf")
write("table.xlsx", to_xlsx(doc))

Zig

var doc = try pdf_oxide.Document.open("table.pdf");
const xlsx_bytes = try doc.toXlsx(a);
try std.fs.cwd().writeFile(.{ .sub_path = "table.xlsx", .data = xlsx_bytes });

Objective-C

POXDocument *doc = [POXDocument openPath:@"table.pdf" error:&err];
NSData *xlsxBytes = [doc toXlsxWithError:&err];
[xlsxBytes writeToFile:@"table.xlsx" atomically:YES];

Elixir

{:ok, doc} = PdfOxide.open("table.pdf")
{:ok, xlsx_bytes} = PdfOxide.to_xlsx(doc)
File.write!("table.xlsx", xlsx_bytes)

Python note: to_docx/to_pptx/to_xlsx are exposed on PdfDocument (the extraction/inspection class), not on the OfficeConverter/Pdf builder used for the Office → PDF direction. Use PdfDocument("file.pdf") to open the source PDF.


How do I open an Office file directly as a PDF document?

The native bindings (Go, C#, Swift, and the C ABI) expose open_from_*_bytes constructors that convert DOCX/PPTX/XLSX bytes and hand back an already-open PdfDocument — convenient when you want to immediately extract text, render, or re-export rather than save the intermediate PDF. Each constructor internally runs OfficeConverter and opens the resulting PDF in one call.

Go

data, err := os.ReadFile("contract.docx")
if err != nil {
    log.Fatal(err)
}

doc, err := pdfoxide.OpenFromDocxBytes(data)
if err != nil {
    log.Fatal(err)
}
defer doc.Close()

// Now work with it as a normal PDF document
text, _ := doc.ExtractText(0)
fmt.Println(text)

C#

using PdfOxide.Core;

byte[] data = File.ReadAllBytes("contract.docx");
using var doc = PdfDocument.OpenFromDocxBytes(data);

// Use it like any other open PDF — extract, render, or re-export
byte[] pdfBytes = doc.ToDocxBytes(); // round-trip if you like

Swift

import PdfOxide
import Foundation

let data = try Data(contentsOf: URL(fileURLWithPath: "contract.docx"))
let doc = try Document.openFromDocxBytes([UInt8](data))
let pageCount = try doc.pageCount()
print("Converted DOCX has \(pageCount) page(s)")

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <fstream>

std::ifstream in("contract.docx", std::ios::binary);
std::vector<std::uint8_t> data((std::istreambuf_iterator<char>(in)), {});

auto doc = pdf_oxide::Document::open_from_docx_bytes(data);
// Now work with it as a normal PDF document
auto text = doc.extract_text(0);

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

final data = File('contract.docx').readAsBytesSync();
final doc = PdfDocument.openFromDocxBytes(data);
final text = doc.extractText(0);

R

library(pdfoxide)

data <- readBin("contract.docx", "raw", file.info("contract.docx")$size)
doc  <- pdf_open_from_docx_bytes(data)
text <- pdf_extract_text(doc, 0)

Julia

using PdfOxide

data = read("contract.docx")
doc  = open_from_docx_bytes(data)
text = extract_text(doc, 0)

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

const data = try std.fs.cwd().readFileAlloc("contract.docx", a, .unlimited);
var doc = try pdf_oxide.Document.openFromDocxBytes(data);
const text = try doc.extractText(a, 0);

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;

NSData *data = [NSData dataWithContentsOfFile:@"contract.docx"];
POXDocument *doc = [POXDocument openFromDocxBytes:data error:&err];
NSString *text = [doc extractText:0 error:&err];

Elixir

data = File.read!("contract.docx")
{:ok, doc} = PdfOxide.open_from_docx_bytes(data)
{:ok, text} = PdfOxide.extract_text(doc, 0)

PPTX and XLSX use the matching constructors:

Source format Go C# Swift
DOCX OpenFromDocxBytes(data) PdfDocument.OpenFromDocxBytes(data) Document.openFromDocxBytes(bytes)
PPTX OpenFromPptxBytes(data) PdfDocument.OpenFromPptxBytes(data) Document.openFromPptxBytes(bytes)
XLSX OpenFromXlsxBytes(data) PdfDocument.OpenFromXlsxBytes(data) Document.openFromXlsxBytes(bytes)

Rust / Python: there is no open_from_docx_bytes constructor on the core PdfDocument. In Rust, convert first with OfficeConverter::new().convert_docx_bytes(&data)? and then PdfDocument::from_bytes(pdf_bytes)?. In Python, use OfficeConverter.from_docx_bytes(data) (documented above), which returns a Pdf.

use pdf_oxide::converters::office::OfficeConverter;
use pdf_oxide::document::PdfDocument;

let data = std::fs::read("contract.docx")?;
let pdf_bytes = OfficeConverter::new().convert_docx_bytes(&data)?;
let doc = PdfDocument::from_bytes(pdf_bytes)?;
println!("{} pages", doc.page_count()?);

Configuration (Rust)

Customize page size, margins, and fonts using OfficeConfig:

use pdf_oxide::converters::office::{OfficeConverter, OfficeConfig};

let config = OfficeConfig::a4(); // A4 page size
let converter = OfficeConverter::with_config(config);
let pdf_bytes = converter.convert_docx("document.docx")?;

OfficeConfig Fields

Field Type Default Description
page_size PageSize Letter Page dimensions
margins Margins 1 inch all sides Page margins in points (72pt = 1 inch)
embed_fonts bool false Whether to embed fonts
default_font String "Helvetica" Fallback font
default_font_size f32 11.0 Default text size in points
line_height f32 1.2 Line height multiplier
include_images bool true Include embedded images

Page Size Presets

let config = OfficeConfig::letter(); // 8.5 × 11 inches (default)
let config = OfficeConfig::a4();     // 210 × 297 mm

Custom Margins

use pdf_oxide::converters::office::Margins;

let mut config = OfficeConfig::letter();
config.margins = Margins::uniform(36.0);  // 0.5 inch margins
config.margins = Margins::none();          // No margins

Batch Conversion

Python

from pdf_oxide import OfficeConverter
from pathlib import Path

office_dir = Path("documents/")
output_dir = Path("pdfs/")
output_dir.mkdir(exist_ok=True)

extensions = {".docx", ".xlsx", ".pptx"}

for doc_path in office_dir.iterdir():
    if doc_path.suffix.lower() in extensions:
        pdf = OfficeConverter.convert(str(doc_path))
        pdf.save(str(output_dir / doc_path.with_suffix(".pdf").name))
        print(f"Converted: {doc_path.name}")

Rust

use pdf_oxide::converters::office::OfficeConverter;
use std::fs;

let converter = OfficeConverter::new();

for entry in fs::read_dir("documents/")? {
    let path = entry?.path();
    match path.extension().and_then(|e| e.to_str()) {
        Some("docx" | "xlsx" | "pptx") => {
            let pdf_bytes = converter.convert(&path)?;
            let out = format!("pdfs/{}.pdf", path.file_stem().unwrap().to_str().unwrap());
            fs::write(&out, pdf_bytes)?;
            println!("Converted: {}", path.display());
        }
        _ => {}
    }
}

C++

#include <pdf_oxide/pdf_oxide.hpp>
#include <filesystem>
#include <fstream>
namespace fs = std::filesystem;

for (const auto& entry : fs::directory_iterator("documents/")) {
    auto path = entry.path();
    auto ext = path.extension().string();

    if (ext != ".docx" && ext != ".xlsx" && ext != ".pptx") continue;

    std::ifstream in(path, std::ios::binary);
    std::vector<std::uint8_t> bytes((std::istreambuf_iterator<char>(in)), {});

    auto doc =
        ext == ".docx" ? pdf_oxide::Document::open_from_docx_bytes(bytes)
        : ext == ".xlsx" ? pdf_oxide::Document::open_from_xlsx_bytes(bytes)
                         : pdf_oxide::Document::open_from_pptx_bytes(bytes);

    auto pdf = doc.get_source_bytes();
    auto out = "pdfs/" + path.stem().string() + ".pdf";
    std::ofstream(out, std::ios::binary)
        .write(reinterpret_cast<const char*>(pdf.data()), pdf.size());
}

Dart

import 'dart:io';
import 'package:pdf_oxide/pdf_oxide.dart';

Directory('pdfs').createSync(recursive: true);

for (final entry in Directory('documents').listSync()) {
  if (entry is! File) continue;
  final ext = entry.path.split('.').last.toLowerCase();
  final bytes = entry.readAsBytesSync();

  final doc = switch (ext) {
    'docx' => PdfDocument.openFromDocxBytes(bytes),
    'xlsx' => PdfDocument.openFromXlsxBytes(bytes),
    'pptx' => PdfDocument.openFromPptxBytes(bytes),
    _ => null,
  };
  if (doc == null) continue;

  final name = entry.uri.pathSegments.last.replaceAll(RegExp(r'\.\w+$'), '');
  File('pdfs/$name.pdf').writeAsBytesSync(doc.getSourceBytes());
}

R

library(pdfoxide)

dir.create("pdfs", showWarnings = FALSE)

for (path in list.files("documents", full.names = TRUE)) {
  ext   <- tolower(tools::file_ext(path))
  bytes <- readBin(path, "raw", file.info(path)$size)

  doc <- switch(ext,
    docx = pdf_open_from_docx_bytes(bytes),
    xlsx = pdf_open_from_xlsx_bytes(bytes),
    pptx = pdf_open_from_pptx_bytes(bytes),
    next)

  out <- file.path("pdfs", paste0(tools::file_path_sans_ext(basename(path)), ".pdf"))
  writeBin(pdf_get_source_bytes(doc), out)
}

Julia

using PdfOxide

mkpath("pdfs")

for path in readdir("documents"; join = true)
    ext   = lowercase(splitext(path)[2])
    bytes = read(path)

    doc = if ext == ".docx"
        open_from_docx_bytes(bytes)
    elseif ext == ".xlsx"
        open_from_xlsx_bytes(bytes)
    elseif ext == ".pptx"
        open_from_pptx_bytes(bytes)
    else
        continue
    end

    name = first(splitext(basename(path)))
    write(joinpath("pdfs", name * ".pdf"), get_source_bytes(doc))
end

Zig

const pdf_oxide = @import("pdf_oxide");
const a = std.heap.page_allocator;

try std.fs.cwd().makePath("pdfs");
var dir = try std.fs.cwd().openDir("documents", .{ .iterate = true });
var it = dir.iterate();
while (try it.next()) |entry| {
    const bytes = try dir.readFileAlloc(entry.name, a, .unlimited);

    var doc = if (std.mem.endsWith(u8, entry.name, ".docx"))
        try pdf_oxide.Document.openFromDocxBytes(bytes)
    else if (std.mem.endsWith(u8, entry.name, ".xlsx"))
        try pdf_oxide.Document.openFromXlsxBytes(bytes)
    else if (std.mem.endsWith(u8, entry.name, ".pptx"))
        try pdf_oxide.Document.openFromPptxBytes(bytes)
    else
        continue;

    const pdf = try doc.sourceBytes(a);
    const stem = entry.name[0 .. std.mem.lastIndexOfScalar(u8, entry.name, '.').?];
    const out = try std.fmt.allocPrint(a, "pdfs/{s}.pdf", .{stem});
    try std.fs.cwd().writeFile(.{ .sub_path = out, .data = pdf });
}

Objective-C

#import "POXPdfOxide.h"
NSError *err = nil;
NSFileManager *fm = [NSFileManager defaultManager];
[fm createDirectoryAtPath:@"pdfs" withIntermediateDirectories:YES attributes:nil error:&err];

for (NSString *name in [fm contentsOfDirectoryAtPath:@"documents" error:&err]) {
    NSString *path = [@"documents" stringByAppendingPathComponent:name];
    NSData *bytes = [NSData dataWithContentsOfFile:path];
    NSString *ext = name.pathExtension.lowercaseString;

    POXDocument *doc;
    if ([ext isEqualToString:@"docx"])      doc = [POXDocument openFromDocxBytes:bytes error:&err];
    else if ([ext isEqualToString:@"xlsx"]) doc = [POXDocument openFromXlsxBytes:bytes error:&err];
    else if ([ext isEqualToString:@"pptx"]) doc = [POXDocument openFromPptxBytes:bytes error:&err];
    else continue;

    NSData *pdf = [doc sourceBytesWithError:&err];
    NSString *out = [@"pdfs" stringByAppendingPathComponent:
        [name.stringByDeletingPathExtension stringByAppendingPathExtension:@"pdf"]];
    [pdf writeToFile:out atomically:YES];
}

Elixir

File.mkdir_p!("pdfs")

for name <- File.ls!("documents") do
  bytes = File.read!(Path.join("documents", name))

  result =
    case Path.extname(name) |> String.downcase() do
      ".docx" -> PdfOxide.open_from_docx_bytes(bytes)
      ".xlsx" -> PdfOxide.open_from_xlsx_bytes(bytes)
      ".pptx" -> PdfOxide.open_from_pptx_bytes(bytes)
      _ -> :skip
    end

  with {:ok, doc} <- result,
       {:ok, pdf} <- PdfOxide.source_bytes(doc) do
    out = Path.join("pdfs", Path.rootname(name) <> ".pdf")
    File.write!(out, pdf)
  end
end

API Reference

<a id=“office-pdf-officeconverter”></a>

Python — OfficeConverter

Method Returns Description
OfficeConverter.convert(path) Pdf Auto-detect format and convert
OfficeConverter.from_docx(path) Pdf Convert DOCX file
OfficeConverter.from_docx_bytes(data) Pdf Convert DOCX from bytes
OfficeConverter.from_xlsx(path) Pdf Convert XLSX file
OfficeConverter.from_xlsx_bytes(data) Pdf Convert XLSX from bytes
OfficeConverter.from_pptx(path) Pdf Convert PPTX file
OfficeConverter.from_pptx_bytes(data) Pdf Convert PPTX from bytes

All methods return a Pdf object. Call pdf.save("output.pdf") or pdf.to_bytes() to get the result.

Rust — OfficeConverter

Method Returns Description
OfficeConverter::new() OfficeConverter Create with default config
OfficeConverter::with_config(config) OfficeConverter Create with custom config
convert(path) Result<Vec<u8>> Auto-detect format and convert
convert_docx(path) Result<Vec<u8>> Convert DOCX file
convert_docx_bytes(bytes) Result<Vec<u8>> Convert DOCX from bytes
convert_xlsx(path) Result<Vec<u8>> Convert XLSX file
convert_xlsx_bytes(bytes) Result<Vec<u8>> Convert XLSX from bytes
convert_pptx(path) Result<Vec<u8>> Convert PPTX file
convert_pptx_bytes(bytes) Result<Vec<u8>> Convert PPTX from bytes

PDF → Office — to_docx / to_pptx / to_xlsx

Exported from an open PDF document. Exposed in Rust, Python, Go, C#, and Swift.

Language Method Returns Description
Rust PdfDocument::to_docx(path) Result<()> Export PDF to a DOCX file on disk
Rust PdfDocument::to_docx_bytes() Result<Vec<u8>> Export PDF to DOCX bytes
Rust PdfDocument::to_pptx(path) / to_pptx_bytes() Result<()> / Result<Vec<u8>> Export PDF to PPTX
Rust PdfDocument::to_xlsx(path) / to_xlsx_bytes() Result<()> / Result<Vec<u8>> Export PDF to XLSX
Python PdfDocument.to_docx(path) / to_docx_bytes() None / bytes Export PDF to DOCX
Python PdfDocument.to_pptx(path) / to_pptx_bytes() None / bytes Export PDF to PPTX
Python PdfDocument.to_xlsx(path) / to_xlsx_bytes() None / bytes Export PDF to XLSX
Go (*PdfDocument).ToDocxBytes() ([]byte, error) Export PDF to DOCX bytes
Go (*PdfDocument).ToPptxBytes() ([]byte, error) Export PDF to PPTX bytes
Go (*PdfDocument).ToXlsxBytes() ([]byte, error) Export PDF to XLSX bytes
C# PdfDocument.ToDocxBytes() byte[] Export PDF to DOCX bytes
C# PdfDocument.ToPptxBytes() byte[] Export PDF to PPTX bytes
C# PdfDocument.ToXlsxBytes() byte[] Export PDF to XLSX bytes
Swift Document.toDocx() [UInt8] Export PDF to DOCX bytes
Swift Document.toPptx() [UInt8] Export PDF to PPTX bytes
Swift Document.toXlsx() [UInt8] Export PDF to XLSX bytes

Office → PDF document — open_from_*_bytes

Native-binding convenience constructors that convert Office bytes and return an open PDF document. Exposed in Go, C#, Swift, and the C ABI. Not available on the Rust core PdfDocument or in Python — use OfficeConverter there (see the table above).

Language Constructor Returns Description
Go OpenFromDocxBytes(data) (*PdfDocument, error) Open a PDF document from DOCX bytes
Go OpenFromPptxBytes(data) (*PdfDocument, error) Open a PDF document from PPTX bytes
Go OpenFromXlsxBytes(data) (*PdfDocument, error) Open a PDF document from XLSX bytes
C# PdfDocument.OpenFromDocxBytes(data) PdfDocument Open a PDF document from DOCX bytes
C# PdfDocument.OpenFromPptxBytes(data) PdfDocument Open a PDF document from PPTX bytes
C# PdfDocument.OpenFromXlsxBytes(data) PdfDocument Open a PDF document from XLSX bytes
Swift Document.openFromDocxBytes(bytes) Document Open a PDF document from DOCX bytes
Swift Document.openFromPptxBytes(bytes) Document Open a PDF document from PPTX bytes
Swift Document.openFromXlsxBytes(bytes) Document Open a PDF document from XLSX bytes
C ABI pdf_document_open_from_docx_bytes(data, len, error_code) PdfDocument * Open a PDF document from DOCX bytes
C ABI pdf_document_open_from_pptx_bytes(data, len, error_code) PdfDocument * Open a PDF document from PPTX bytes
C ABI pdf_document_open_from_xlsx_bytes(data, len, error_code) PdfDocument * Open a PDF document from XLSX bytes

FAQ

Does converting a PDF to DOCX keep the layout?

Yes, within limits. For documents at or below the layout threshold (30 pages for DOCX/PPTX, 200 for XLSX), to_docx_bytes / to_pptx_bytes / to_xlsx_bytes use a layout-preserving path that emits each PDF text span as a positioned, editable element and embeds the source PDF’s fonts, so a PDF → Office → PDF round-trip keeps the original page dimensions. Larger documents fall back to a flow path that reflows text into real paragraphs so Word/PowerPoint/Excel open them instantly.

Can I convert a PDF back to PowerPoint or Excel, not just Word?

Yes. to_pptx/to_pptx_bytes map each PDF page to one slide sized to the source MediaBox, and to_xlsx/to_xlsx_bytes map each page to one worksheet. Both are available in Rust, Python, Go, C#, and Swift.

Why is there no open_from_docx_bytes in Python?

Python exposes the Office → PDF direction through the higher-level OfficeConverter class instead (OfficeConverter.from_docx_bytes(data) returns a Pdf). The open_from_*_bytes constructors are convenience wrappers added at the native FFI layer (Go, C#, Swift, C ABI), where there is no separate converter class.

Do I need Microsoft Office or LibreOffice installed?

No. PDF Oxide reads and writes the OOXML (DOCX/XLSX/PPTX) format directly in pure Rust. There are no external process calls, COM automation, or headless Office instances — conversion works the same on Linux, macOS, and Windows.