Use OkraPDF with Claude

Compatibility

The MCP server uses the Streamable HTTP transport (MCP spec). Not all clients support it yet.

Client	Status	Notes
OpenCode	✅ Verified	Tested and working
Claude Desktop	⚠️ Untested	Should work — uses same streamable-http config. Report issues to us.
Claude Code (CLI)	❌ Known issue	406 due to missing `Accept` header (#54)
Cursor	⚠️ Untested	Config shown below; let us know if it works.
Windsurf	⚠️ Untested	Config shown below; let us know if it works.

We’re working on broader client support. If your MCP client isn’t listed, try the config below and open an issue if it doesn’t connect.

What you get

Connect OkraPDF to Claude and you can:

Upload any PDF by pasting a link — SEC filings, earnings reports, invoices, contracts
Ask questions and get answers with page citations (“What was Q3 revenue?” → “$60.9B, page 47”)
Extract tables as structured data that you can paste into Excel
Compare across documents — run the same question across a portfolio of filings at once

No coding required. Just talk to Claude like normal — OkraPDF handles the extraction behind the scenes.

Setup (2 minutes)

Claude Desktop
Claude Code (CLI)
Cursor / Windsurf / Other

Get your API key from okrapdf.com/settings/api-keys
Open Claude Desktop → Settings → Developer → Edit Config
Add OkraPDF to your config file:

{
  "mcpServers": {
    "okrapdf": {
      "url": "https://api.okrapdf.com/mcp",
      "headers": {
        "Authorization": "Bearer okra_YOUR_API_KEY"
      }
    }
  }
}

Replace okra_YOUR_API_KEY with your actual key
Restart Claude Desktop

That’s it. Claude now has access to all OkraPDF tools. Start a new conversation and try: “Upload this 10-K and tell me the revenue: https://sec.gov/…”

Claude Code has a known issue connecting to streamable-http MCP servers (returns 406). Use OpenCode as your MCP client until this is resolved.

claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp \
  --header "Authorization: Bearer okra_YOUR_API_KEY"

Add to your MCP config (.cursor/mcp.json or equivalent):

{
  "mcpServers": {
    "okrapdf": {
      "type": "http",
      "url": "https://api.okrapdf.com/mcp",
      "headers": {
        "Authorization": "Bearer okra_YOUR_API_KEY"
      }
    }
  }
}

Tools

The server exposes 7 tools:

Tool	What it does
`upload_document`	Upload a PDF from any URL. Waits for extraction to complete by default.
`get_document_status`	Check if a document is ready (phase, page count, data points extracted).
`list_documents`	List your uploaded documents sorted by most recent.
`read_document`	Get extracted text and tables as markdown, with optional page ranges.
`ask_document`	Ask a question and get an answer with page citations.
`extract_data`	Extract specific data points into structured JSON using a schema you define.
`extract_collection`	Run the same extraction across every document in a collection.

Quick Start

Once connected, just talk to Claude. Here are examples of what you can say:

Upload and ask questions

Upload this 10-K and tell me the total revenue:
https://www.sec.gov/Archives/edgar/data/1045810/000104581024000316/nvda-20240128.htm

Claude uploads the PDF, extracts all the content, and answers your question with the exact page number.

Read specific sections

Read pages 40-45 of my NVIDIA filing

Great for large filings when you know which pages have the financial statements.

Extract data for Excel

Extract revenue, net income, and EPS from this 10-K as a table I can paste into Excel

Claude extracts the data in a structured format you can copy straight into a spreadsheet.

Compare across filings

Compare NVIDIA's revenue across my last 4 uploaded 10-Ks

Claude queries each document in parallel and builds a comparison table — with page citations for every number.

Parallel Queries

MCP clients that support parallel tool calls (like Claude Code) can ask the same question across multiple documents simultaneously. This is the fastest way to compare data across filings.

Example: Operating margins across 3 companies

> What was the operating margin for PepsiCo, Amazon, and AMD?

The agent fires 3 ask_document calls in parallel — one per document — and results come back at the same time:

Company	FY	Operating Margin
PepsiCo	2022	13.3%
Amazon	2019	5.2%
AMD	2022	5.4%

Each answer includes page citations back to the source filing. No sequential waiting — all three run concurrently against their respective Durable Objects.

How it works

Each document lives in its own Durable Object on Cloudflare’s edge. Parallel queries hit separate DOs — there’s no shared bottleneck.

Try It

# 1. Connect (one-time)
claude mcp add -s user -t http okrapdf https://api.okrapdf.com/mcp

# 2. Upload a PDF
> "Upload https://arxiv.org/pdf/2401.04088"
#  → upload_document(url, wait=true)
#  → doc-25725cc7... complete, 853 nodes extracted

# 3. Ask a question
> "What is this paper about?"
#  → ask_document(doc-25725cc7, "What is this paper about?")
#  → "Mixtral 8x7B, a sparse mixture of experts model..." (p.1)

# 4. Parallel queries — the killer feature
> "Compare operating margins for PepsiCo, Amazon, and AMD"
#
#  Agent fires 3 calls at once:
#  ┌─ ask_document(PepsiCo)  ──→  13.3%  (FY2022, p.40)
#  ├─ ask_document(Amazon)   ──→   5.2%  (FY2019, p.38)
#  └─ ask_document(AMD)      ──→   5.4%  (FY2022, p.48)
#
#  Each doc is its own Durable Object — no shared bottleneck.
#  All three return at the same time.

# 5. Structured extraction
> "Extract revenue by segment as JSON"
#  → extract_data(doc, prompt, json_schema)
#  → { "segments": [{ "name": "PBNA", "revenue": 26213 }, ...] }

Tool Reference

upload_document

Parameter	Type	Required	Description
`url`	string	Yes	Public URL of the PDF
`document_id`	string	No	Custom ID (auto-generated if omitted)
`wait`	boolean	No	Wait for extraction to complete (default: `true`)
`page_images`	`"none"` \| `"cover"` \| `"eager"` \| `"lazy"`	No	Page image strategy (default: `"eager"`; `"lazy"` is a deprecated alias for `"eager"`)

read_document

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`pages`	string	No	Page range, e.g. `"1-5"` or `"3"`. Omit for all pages.

ask_document

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`question`	string	Yes	Natural language question
`model`	string	No	OpenRouter ID (e.g. `"anthropic/claude-haiku-4.5"`) or direct provider (e.g. `"anthropic:claude-haiku-4-5-20251001"`, `"openai:gpt-4o"`). Omit for default.
`instructions`	string	No	Extra instructions appended to the system prompt.

extract_data

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID
`prompt`	string	Yes	Extraction instruction
`json_schema`	object	Yes	JSON Schema for desired output shape
`model`	string	No	OpenRouter ID or direct provider format. Omit for default.
`instructions`	string	No	Extra instructions appended to the system prompt.

Example: 10-K financial extraction

Prompt:

Extract key financial details from this 10-K: revenue, net income,
total assets, total liabilities, cash and equivalents, and EPS diluted.

Schema:

{
  "type": "object",
  "properties": {
    "company": { "type": "string" },
    "fiscal_year_ended": { "type": "string" },
    "revenue": { "type": "string" },
    "net_income": { "type": "string" },
    "total_assets": { "type": "string" },
    "total_liabilities": { "type": "string" },
    "cash_and_equivalents": { "type": "string" },
    "eps_diluted": { "type": "string" }
  }
}

Response:

{
  "company": "NVIDIA Corporation",
  "fiscal_year_ended": "January 25, 2026",
  "revenue": "$215,938 million",
  "net_income": "$120,067 million",
  "total_assets": "$206,803 million",
  "total_liabilities": "$49,510 million",
  "cash_and_equivalents": "$10,605 million",
  "eps_diluted": "$4.90"
}

Nested schemas work too — group fields into income_statement, balance_sheet, and cash_flow objects for cleaner output.

get_document_status

Parameter	Type	Required	Description
`document_id`	string	Yes	Document ID

extract_collection

Parameter	Type	Required	Description
`collection_id`	string	Yes	Collection ID or name
`prompt`	string	Yes	Extraction instruction applied to each document
`json_schema`	object	Yes	JSON Schema for the per-document output shape
`merge_strategy`	`"array"` \| `"merge"`	No	How to combine results (default: `"array"`)
`pages`	string	No	Page range applied to each document (e.g. `"1-10"`)
`model`	string	No	OpenRouter ID or direct provider format — applied to every document.
`instructions`	string	No	Extra instructions — applied to every document.

list_documents

Parameter	Type	Required	Description
`limit`	integer	No	Max documents to return (default: 20, max: 100)

Model & Prompt Configuration

ask_document, extract_data, and extract_collection accept optional model and instructions parameters.

Choose a model

Pass any OpenRouter model ID, or use a direct provider key:

> Ask doc-abc123 "What was total revenue?" using anthropic/claude-haiku-4.5

The agent calls:

{
  "document_id": "doc-abc123",
  "question": "What was total revenue?",
  "model": "anthropic/claude-haiku-4.5"
}

Popular choices:

Model	Speed	Best for
`moonshotai/kimi-k2.5`	Fast	General Q&A (default)
`anthropic/claude-haiku-4.5`	Fast	Precise extraction, concise answers
`anthropic/claude-sonnet-4.5`	Medium	Complex reasoning, multi-step analysis
`google/gemini-2.5-flash`	Fast	Large context, long documents

Use your own API key (direct provider)

Instead of routing through OpenRouter, call Anthropic, OpenAI, or Google directly with your own API key. Use provider:model format (colon, not slash):

Format	Provider	Example
`anthropic:model-id`	Anthropic	`anthropic:claude-haiku-4-5-20251001`
`openai:model-id`	OpenAI	`openai:gpt-4o`
`google:model-id`	Google	`google:gemini-2.5-flash`
`vendor/model` (slash)	OpenRouter	`anthropic/claude-haiku-4.5`

> Ask doc-abc123 "What was total revenue?" using anthropic:claude-haiku-4-5-20251001

To use direct providers, store your API key in OkraPDF (Settings > API Keys > Vendor Keys) or pass it via the X-Vendor-Keys header.

Add instructions

Append domain-specific hints to the system prompt:

> Extract revenue from doc-abc123. Note: amounts are in HKD millions
> and the fiscal year ends in March.

The agent calls:

{
  "document_id": "doc-abc123",
  "prompt": "Extract annual revenue",
  "json_schema": { "type": "object", "properties": { "revenue": { "type": "string" } } },
  "instructions": "Amounts are in HKD millions. Fiscal year ends in March."
}

Use instructions for:

Currency/unit context: “All values are in EUR thousands”
Fiscal year hints: “FY ends June 30”
Output format: “Use ISO dates (YYYY-MM-DD)”
Language: “Respond in Chinese”

curl examples

Ask a question via OpenRouter:

curl -X POST https://api.okrapdf.com/document/doc-abc123/chat/completions \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What was total revenue?"}],
    "agentConfigOverride": {
      "llm": {"chat": {"provider": "openrouter", "model": "anthropic/claude-haiku-4.5"}}
    },
    "systemPromptSuffix": "Respond in one sentence with the page number."
  }'

Ask using your own Anthropic key directly:

curl -X POST https://api.okrapdf.com/document/doc-abc123/chat/completions \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -H 'X-Vendor-Keys: {"anthropic": "sk-ant-..."}' \
  -d '{
    "messages": [{"role": "user", "content": "What was total revenue?"}],
    "agentConfigOverride": {
      "llm": {"chat": {"provider": "anthropic", "model": "claude-haiku-4-5-20251001"}}
    }
  }'

Structured extraction with OpenAI direct:

curl -X POST https://api.okrapdf.com/document/doc-abc123/chat/completions \
  -H "Authorization: Bearer okra_YOUR_KEY" \
  -H "Content-Type: application/json" \
  -H 'X-Vendor-Keys: {"openai": "sk-..."}' \
  -d '{
    "messages": [{"role": "user", "content": "Extract key financials"}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "extraction",
        "schema": {
          "type": "object",
          "properties": {
            "revenue": {"type": "string"},
            "net_income": {"type": "string"}
          }
        },
        "strict": true
      }
    },
    "agentConfigOverride": {
      "llm": {"chat": {"provider": "openai", "model": "gpt-4o"}}
    },
    "systemPromptSuffix": "Values should include currency symbol and unit."
  }'

When using the MCP tools (not curl), you don’t need agentConfigOverride or systemPromptSuffix — just pass model and instructions as tool parameters. The MCP server handles the translation.

Documentation Index

​Compatibility

​What you get

​Setup (2 minutes)

​Tools

​Quick Start

​Upload and ask questions

​Read specific sections

​Extract data for Excel

​Compare across filings

​Parallel Queries

​Example: Operating margins across 3 companies

​How it works

​Try It

​Tool Reference

​upload_document

​read_document

​ask_document

​extract_data

​Example: 10-K financial extraction

​get_document_status

​extract_collection

​list_documents

​Model & Prompt Configuration

​Choose a model

​Use your own API key (direct provider)

​Add instructions

​curl examples

Compatibility

What you get

Setup (2 minutes)

Tools

Quick Start

Upload and ask questions

Read specific sections

Extract data for Excel

Compare across filings

Parallel Queries

Example: Operating margins across 3 companies

How it works

Try It

Tool Reference

upload_document

read_document

ask_document

extract_data

Example: 10-K financial extraction

get_document_status

extract_collection

list_documents

Model & Prompt Configuration

Choose a model

Use your own API key (direct provider)

Add instructions

curl examples