A Sandbox Transform lets you push JavaScript to OkraPDF that runs on the edge in an isolated V8 sandbox. Your code gets a read-only DOCS object to query documents — but cannot modify them, access the network, or escape the sandbox.Your code is an async function body. Write it like the inside of an async function — use await, return a result, and call DOCS methods to read your documents. No imports, no class boilerplate.
Open Beta — Sandbox Transforms are in open beta. The API surface is stable but
may receive non-breaking additions. Breaking changes will be announced with 30 days notice.
SDK / API
Pull data to your environment, transform locally.
Best when you write and trust the code.
Sandbox Transform
Push code to OkraPDF, run on the edge.
Best when your users or an LLM writes the code.
result contains whatever your code returned (any JSON-serializable value, max 10 MB). logs contains console.log() output from your code (see Debugging).
CPU timeout means your code used too much compute. Wall-clock timeout usually means DOCS I/O was slow (large documents, many sequential reads). To fix wall-clock timeouts, reduce the number of docs or read fewer pages per doc.Parse errors (422) are caught before the sandbox starts — your code never ran. Runtime errors (500) mean the sandbox executed but your code threw.
In public mode, each doc ID is validated when you call a DOCS method:
Valid public doc — returns data normally.
Non-existent ID — DOCS.getNodes("doc-invalid") throws "doc doc-invalid not in allowed set".
Exists but not published — throws "getNodes failed: 403".
Mixed valid/invalid — each call succeeds or fails independently. The sandbox doesn’t abort on a single doc failure. Use try/catch in your code to handle partial results.
In authenticated mode (apiKey), the same behavior applies — individual DOCS calls can fail if a doc is in error phase or was deleted. Always wrap DOCS calls in try/catch when iterating.
The apiKey field is placed in the request body (not the Authorization header) because sandbox
calls are often composed programmatically — by LLMs generating tool calls, code editors submitting
forms, or serverless functions building payloads. Body placement simplifies these integrations.
Never send apiKey from browser-side JavaScript — use docIds mode for client-side calls.
The doc IDs above are real public FinanceBench documents — the examples work as-is.
Replace them with your own doc IDs, or add "apiKey": "okra_..." instead of docIds
to access your private documents.
For authenticated mode, replace docIds with "apiKey": "okra_..." in the
request body. The sandbox will have access to all documents your key can read.
Your sandbox code has access to a global DOCS object — a read-only interface to document data.Use querySql() first when you want deterministic retrieval, filtering, ranking, or cross-document comparison in code. Use getNodes() when you need raw node payloads. Text methods (getMarkdown, getPage) return plain strings directly.
// Structured data → objectsDOCS.list() → Doc[]DOCS.querySql(docId: string, sql: string) → { rows: object[], count: number, warning?: string, hint?: string }// Structured payloads serialized by document endpointsDOCS.getNodes(docId: string, page?: string) → JSON string of { nodes: Node[], total: number }DOCS.getStatus(docId: string) → JSON string of DocStatus// Text data → plain strings (use directly)DOCS.getMarkdown(docId: string) → markdown stringDOCS.getPage(docId: string, page: string) → page markdown string
Scoping: In docIds mode, DOCS.list() returns only the documents you specified — not all public docs. In apiKey mode, it returns all documents your key can access (up to 100).Pagination: There is currently no cursor or offset parameter. DOCS.list() returns a maximum of 100 documents. If you have more than 100 documents and need to process them all, use the SDK/API to get the full list, then pass specific docIds to the sandbox.
interface Doc { id: string; file_name: string; total_pages: number; status: string; pages_completed?: number; inserted_at?: number | string; updated_at?: number | string; is_public?: boolean;}interface SqlResult { rows: Array<Record<string, unknown>>; count: number; warning?: string; hint?: string;}interface DocStatus { phase: string; // "complete", "extracting", "error", etc. file_name: string; total_pages: number; pages_completed: number;}interface Node { id: string; type: string; // "text", "heading", "table", "key_value", etc. label: string | null; value: string | null; page_number: number | null; confidence: number | null; bbox_x: number | null; bbox_y: number | null; bbox_w: number | null; bbox_h: number | null;}
No write methods exist. The sandbox cannot modify, delete, or re-upload documents.
It cannot make network requests (fetch is blocked). It can only read.
60 requests/min per API key (public: 20/min per IP)
Rate limit headers
Retry-After, X-RateLimit-Remaining on 429
Wall-clock timeout includes time waiting on DOCS I/O (fetching nodes from storage). If a document is slow to read, that counts against your 60s budget.Standard Workers built-ins (URL, TextEncoder, crypto, JSON, etc.) are available. Node.js APIs (fs, net, child_process) are not. No import statements — your code runs as a function body, not a module.
First request incurs isolate startup cost. The platform may cache warm isolates for repeated requests with identical code, but this is not guaranteed — design for cold starts.
Pull the exact lines you want, then compute in code. This is usually better than scanning all nodes.
curl -s https://api.okrapdf.com/v1/sandbox/run \ -H "Content-Type: application/json" \ -d '{ "docIds": ["doc-fac0d10b8ebc4a2ebf13"], "code": "const docId = \"doc-fac0d10b8ebc4a2ebf13\";\nconst matches = await DOCS.querySql(docId, \"SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH '\\''property'\\'' LIMIT 5\");\nreturn matches;" }' | jq .result
Readable version:
const docId = "doc-fac0d10b8ebc4a2ebf13";const matches = await DOCS.querySql( docId, "SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH 'property' LIMIT 5");return matches;
Custom scoring, multi-field boolean logic, regex patterns — beyond what keyword or semantic search offers.
const docs = await DOCS.list();const hits = [];for (const doc of docs) { try { const data = await DOCS.querySql( doc.id, "SELECT value FROM nodes WHERE value LIKE '%material weakness%' OR value LIKE '%restatement%' OR value LIKE '%risk factor%' LIMIT 200" ); const text = data.rows.map((row) => row.value).filter(Boolean).join(" "); let score = 0; if (/risk factor/gi.test(text)) score += 2; if (/material weakness/gi.test(text)) score += 5; if (/restatement/gi.test(text)) score += 10; if (score >= 5) hits.push({ id: doc.id, name: doc.file_name, score }); } catch { /* skip */ }}return hits.sort((a, b) => b.score - a.score);
The /v1/sandbox/run endpoint supports CORS (Access-Control-Allow-Origin: *) for both modes, so browser calls work. However:
Never pass apiKey from browser-side JavaScript — it exposes your credentials to anyone
who inspects network traffic. For authenticated mode, proxy through your own backend.
The docIds mode is safe to call directly from the client since it requires no credentials.
Sandbox Transforms are available on all OkraPDF paid plans. You do not need a separate Cloudflare account — OkraPDF manages the infrastructure.Included executions and CPU limits vary by plan — see the pricing page for your plan’s allowances. Overage rates:
Dimension
Overage rate
Executions
$0.06 per 1,000
CPU time
$0.02 per 1M ms
During the current open beta (started March 2026), execution fees are waived — you only pay for CPU time that exceeds your plan’s included allowance.