Getting Started with PDF Oxide MCP Server
pdf-oxide-mcp is a Model Context Protocol server that lets AI assistants extract content from PDFs. It runs locally — no files leave your machine.
Install crgx (one-time)
crgx is an npx-like runner for Rust binaries — it auto-downloads pdf_oxide_mcp on first run. No manual MCP install needed.
Linux / macOS
curl -fsSL crgx.dev/install.sh | sh
Windows (PowerShell)
irm crgx.dev/install.ps1 | iex
Configuration
After installing crgx, add the config below to your AI tool. That’s it — crgx handles downloading and updating pdf_oxide_mcp automatically.
Claude Desktop
Add to ~/.config/claude/claude_desktop_config.json (Linux) or ~/Library/Application Support/Claude/claude_desktop_config.json (macOS):
{
"mcpServers": {
"pdf-oxide": {
"command": "crgx",
"args": ["pdf_oxide_mcp@latest"]
}
}
}
Claude Code
Add to your project’s .claude/settings.json:
{
"mcpServers": {
"pdf-oxide": {
"command": "crgx",
"args": ["pdf_oxide_mcp@latest"]
}
}
}
Cursor
Add to Cursor MCP settings:
{
"mcpServers": {
"pdf-oxide": {
"command": "crgx",
"args": ["pdf_oxide_mcp@latest"]
}
}
}
Alternative Installation
If you prefer not to use crgx, you can install pdf_oxide_mcp directly:
Homebrew (macOS / Linux)
brew install yfedoseev/tap/pdf-oxide # includes pdf-oxide-mcp
Cargo
cargo install pdf_oxide_mcp
Then use the binary path directly in your config:
{
"mcpServers": {
"pdf-oxide": {
"command": "pdf-oxide-mcp"
}
}
}
Available Tools
extract
Extract text, markdown, or HTML from a PDF file.
| Parameter | Type | Required | Description |
|---|---|---|---|
file_path |
string | Yes | Path to the PDF file |
output_path |
string | Yes | Path to write extracted content |
format |
string | No | "text" (default), "markdown", or "html" |
pages |
string | No | Page range, e.g. "1-3,7,10-12" |
password |
string | No | Password for encrypted PDFs |
images |
boolean | No | Extract images to files alongside output |
embed_images |
boolean | No | Embed images as base64 in markdown/html (default: true) |
How It Works
The MCP server communicates over stdio using JSON-RPC 2.0. When an AI assistant needs to read a PDF, it sends a tools/call request and receives the extracted content back.
All processing happens locally using the same Rust extraction engine as the library and CLI — no data is sent to external services.
Prompts You Can Give the Assistant
Once the MCP server is wired up, the assistant can call extract on its own. Prompts that work well:
- “Pull the markdown of
report.pdfintoreport.md.” - “Extract pages 4–8 of
contract.pdfas HTML with images embedded, save tocontract.html.” - “
bank-statement.pdfis password-protected (pw:hunter2) — extract just the transactions table to text.”
Under the hood the assistant issues a JSON-RPC call like:
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "extract",
"arguments": {
"file_path": "/path/report.pdf",
"output_path": "/path/report.md",
"format": "markdown",
"pages": "4-8",
"images": true,
"embed_images": true
}
}
}
The server writes the result to output_path and returns a short confirmation — the assistant can then read that file back into its context.