Sandbox Transforms

What is a Sandbox Transform?

A Sandbox Transform lets you push JavaScript to OkraPDF that runs on the edge in an isolated V8 sandbox. Your code gets a read-only DOCS object to query documents — but cannot modify them, access the network, or escape the sandbox. Your code is an async function body. Write it like the inside of an async function — use await, return a result, and call DOCS methods to read your documents. No imports, no class boilerplate.

Open Beta — Sandbox Transforms are in open beta. The API surface is stable but may receive non-breaking additions. Breaking changes will be announced with 30 days notice.

SDK / API

Pull data to your environment, transform locally. Best when you write and trust the code.

Sandbox Transform

Push code to OkraPDF, run on the edge. Best when your users or an LLM writes the code.

Quick start

POST /v1/sandbox/run

Request body

Field	Type	Required	Description
`code`	string	Yes	JavaScript async function body. Has access to `DOCS` global. Must `return` a result.
`apiKey`	string	One of	Your OkraPDF API key. `DOCS.list()` returns all your docs.
`docIds`	string[]	One of	Specific document IDs to scope access. No auth needed for public completion documents. Max 20.

You must provide either apiKey or docIds (not both). Need an API key? Create one here.

Response

Success (200):

{
  "ok": true,
  "result": [{ "name": "3M_2018_10K.pdf", "pages": 160 }],
  "logs": ["processed 2 docs"]
}

result contains whatever your code returned (any JSON-serializable value, max 10 MB). logs contains console.log() output from your code (see Debugging).

Error responses

Status	When	Example
400	Missing fields, empty `docIds`, or both `apiKey` and `docIds` provided	`{ "error": "provide apiKey or docIds, not both" }`
401	Invalid or revoked API key	`{ "error": "invalid api key" }`
403	Key lacks read access to a doc	`{ "error": "forbidden: insufficient permissions" }`
422	Code fails to parse (syntax error)	`{ "ok": false, "error": "SyntaxError: Unexpected token '}'" }`
429	Rate limit exceeded	`{ "error": "rate limit exceeded" }` + `Retry-After` header
500	Runtime error (your code threw)	`{ "ok": false, "error": "TypeError: ...", "logs": [...] }`
504	Timeout	See below

Timeout errors (504) have distinct messages depending on the cause:

CPU limit exceeded: { "ok": false, "error": "CPU time limit exceeded (30s)" }
Wall-clock timeout: { "ok": false, "error": "execution timed out (60s wall-clock)" }

CPU timeout means your code used too much compute. Wall-clock timeout usually means DOCS I/O was slow (large documents, many sequential reads). To fix wall-clock timeouts, reduce the number of docs or read fewer pages per doc. Parse errors (422) are caught before the sandbox starts — your code never ran. Runtime errors (500) mean the sandbox executed but your code threw.

`docIds` behavior

In public mode, each doc ID is validated when you call a DOCS method:

Valid public doc — returns data normally.
Non-existent ID — DOCS.getNodes("doc-invalid") throws "doc doc-invalid not in allowed set".
Exists but not published — throws "getNodes failed: 403".
Mixed valid/invalid — each call succeeds or fails independently. The sandbox doesn’t abort on a single doc failure. Use try/catch in your code to handle partial results.

In authenticated mode (apiKey), the same behavior applies — individual DOCS calls can fail if a doc is in error phase or was deleted. Always wrap DOCS calls in try/catch when iterating.

The apiKey field is placed in the request body (not the Authorization header) because sandbox calls are often composed programmatically — by LLMs generating tool calls, code editors submitting forms, or serverless functions building payloads. Body placement simplifies these integrations. Never send apiKey from browser-side JavaScript — use docIds mode for client-side calls.

Minimal example

Try this against real public FinanceBench docs (these IDs work — no API key needed):

// Requires Node 18+ (top-level await in ES modules)
const code = `
  const docs = await DOCS.list();
  return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));
`;

const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    docIds: ["doc-fac0d10b8ebc4a2ebf13", "doc-ea6befd03a6b4a4aaf0d"],
    code,
  }),
});

const { ok, result, error } = await res.json();
console.log(ok ? result : `Error: ${error}`);
// → [{ name: "3M_2018_10K.pdf", pages: 160 }, { name: "ACTIVISIONBLIZZARD_2019_10K.pdf", pages: 198 }]

The doc IDs above are real public FinanceBench documents — the examples work as-is. Replace them with your own doc IDs, or add "apiKey": "okra_..." instead of docIds to access your private documents.

For authenticated mode, replace docIds with "apiKey": "okra_..." in the request body. The sandbox will have access to all documents your key can read.

The `DOCS` object

Your sandbox code has access to a global DOCS object — a read-only interface to document data. Use querySql() first when you want deterministic retrieval, filtering, ranking, or cross-document comparison in code. Use getNodes() when you need raw node payloads. Text methods (getMarkdown, getPage) return plain strings directly.

// Structured data → objects
DOCS.list()                               → Doc[]
DOCS.querySql(docId: string, sql: string) → { rows: object[], count: number, warning?: string, hint?: string }

// Structured payloads serialized by document endpoints
DOCS.getNodes(docId: string, page?: string) → JSON string of { nodes: Node[], total: number }
DOCS.getStatus(docId: string)               → JSON string of DocStatus

// Text data → plain strings (use directly)
DOCS.getMarkdown(docId: string)            → markdown string
DOCS.getPage(docId: string, page: string) → page markdown string

Scoping: In docIds mode, DOCS.list() returns only the documents you specified — not all public docs. In apiKey mode, it returns all documents your key can access (up to 100). Pagination: There is currently no cursor or offset parameter. DOCS.list() returns a maximum of 100 documents. If you have more than 100 documents and need to process them all, use the SDK/API to get the full list, then pass specific docIds to the sandbox.

`querySql` notes

Read-only only: SELECT and WITH queries are allowed. Mutations are blocked.
Queryable tables: nodes, nodes_fts, edges, page_ledger, verifications, meta, findings.
nodes_fts auto-initializes on first query for hydrated docs, so FTS works in sandbox mode without a separate warm-up step.
For exact line retrieval, start with nodes_fts MATCH or nodes.value LIKE ..., then do the math in JavaScript.

Types

interface Doc {
  id: string;
  file_name: string;
  total_pages: number;
  status: string;
  pages_completed?: number;
  inserted_at?: number | string;
  updated_at?: number | string;
  is_public?: boolean;
}

interface SqlResult {
  rows: Array<Record<string, unknown>>;
  count: number;
  warning?: string;
  hint?: string;
}

interface DocStatus {
  phase: string;             // "complete", "extracting", "error", etc.
  file_name: string;
  total_pages: number;
  pages_completed: number;
}

interface Node {
  id: string;
  type: string;        // "text", "heading", "table", "key_value", etc.
  label: string | null;
  value: string | null;
  page_number: number | null;
  confidence: number | null;
  bbox_x: number | null;
  bbox_y: number | null;
  bbox_w: number | null;
  bbox_h: number | null;
}

No write methods exist. The sandbox cannot modify, delete, or re-upload documents. It cannot make network requests (fetch is blocked). It can only read.

Limits

Limit	Value
CPU time per execution	30 seconds
Wall-clock timeout	60 seconds (includes I/O wait for DOCS calls)
Code size	1 MB
Response payload	10 MB
`docIds` per request	20
Concurrent executions	10 per API key (public mode: 5 per IP)
Rate limit	60 requests/min per API key (public: 20/min per IP)
Rate limit headers	`Retry-After`, `X-RateLimit-Remaining` on 429

Wall-clock timeout includes time waiting on DOCS I/O (fetching nodes from storage). If a document is slow to read, that counts against your 60s budget. Standard Workers built-ins (URL, TextEncoder, crypto, JSON, etc.) are available. Node.js APIs (fs, net, child_process) are not. No import statements — your code runs as a function body, not a module.

Performance

Scenario	Typical latency
Cold start (new isolate)	50–200ms
Warm isolate (repeated calls)	5–15ms
`DOCS.querySql()` per doc	20–150ms (exact-match / FTS patterns)
`DOCS.getNodes()` per doc	100–500ms (depends on doc size)
10-doc reduce	1–5s total

First request incurs isolate startup cost. The platform may cache warm isolates for repeated requests with identical code, but this is not guaranteed — design for cold starts.

What the sandbox blocks

Action	Result
`fetch("https://evil.com")`	`"not permitted to access the internet"`
`DOCS.getNodes("doc-not-yours")`	`"doc doc-not-yours not in allowed set"`
Write to any storage	No write bindings exist
`import("node:fs")`	Not available in Workers runtime

SDK vs Sandbox

	SDK / API	Sandbox Transform
Code author	You (trusted)	Users, LLMs, agents (untrusted)
Runs where	Your server	Cloudflare edge isolate
Network	Full	Blocked
Mutation risk	Possible	Impossible — no write methods
Dependencies	Full npm	Built-ins only
Latency	N round-trips for N docs	1 round-trip, code runs next to data
Best for	Pipelines, scripts, CI/CD	Code editors, playgrounds, agent tool calls

Use cases

1. Exact retrieval with `querySql`

Pull the exact lines you want, then compute in code. This is usually better than scanning all nodes.

curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docId = \"doc-fac0d10b8ebc4a2ebf13\";\nconst matches = await DOCS.querySql(docId, \"SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH '\\''property'\\'' LIMIT 5\");\nreturn matches;"
  }' | jq .result

Readable version:

const docId = "doc-fac0d10b8ebc4a2ebf13";
const matches = await DOCS.querySql(
  docId,
  "SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH 'property' LIMIT 5"
);
return matches;

2. Cross-document reduce

Do deterministic retrieval plus math across one or more docs.

# Copy-paste into terminal — extracts 3M 2018 capex from the filing
curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docs = await DOCS.list();\nconst results = [];\nfor (const doc of docs) {\n  const line = await DOCS.querySql(doc.id, \"SELECT value FROM nodes WHERE value LIKE '\\''%Purchases of property, plant and equipment%'\\'' LIMIT 1\");\n  const value = line.rows[0]?.value ?? null;\n  const match = typeof value === 'string' ? value.match(/\\(([^)]+)\\)|([\\d,]+)/) : null;\n  results.push({ doc: doc.file_name, raw: value, extracted: match ? (match[1] || match[2]) : null });\n}\nreturn results;"
  }' | jq .result

The sandbox code (readable version):

const docs = await DOCS.list();
const results = [];
for (const doc of docs) {
  const line = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  const value = line.rows[0]?.value ?? null;
  const match = typeof value === "string" ? value.match(/\(([^)]+)\)|([\d,]+)/) : null;
  results.push({
    doc: doc.file_name,
    raw: value,
    extracted: match ? (match[1] || match[2]) : null,
  });
}
return results;

3. User-defined filters

Custom scoring, multi-field boolean logic, regex patterns — beyond what keyword or semantic search offers.

const docs = await DOCS.list();
const hits = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%material weakness%' OR value LIKE '%restatement%' OR value LIKE '%risk factor%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    let score = 0;
    if (/risk factor/gi.test(text)) score += 2;
    if (/material weakness/gi.test(text)) score += 5;
    if (/restatement/gi.test(text)) score += 10;
    if (score >= 5) hits.push({ id: doc.id, name: doc.file_name, score });
  } catch { /* skip */ }
}
return hits.sort((a, b) => b.score - a.score);

4. LLM tool calls as code

Instead of an AI agent making 10 sequential API calls, generate one JS function that does the same work in a single sandbox execution.

const docs = await DOCS.list();
const comparison = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%risk factor%' OR value LIKE '%artificial intelligence%' OR value LIKE '%machine learning%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    comparison.push({
      company: doc.file_name.replace(/_\d{4}.*/, ''),
      year: doc.file_name.match(/\d{4}/)?.[0],
      hasRiskFactors: /risk factor/i.test(text),
      mentionsAI: /artificial intelligence|machine learning/i.test(text),
      pageCount: doc.total_pages,
    });
  } catch { /* skip */ }
}
return comparison;

5. Embeddable code playground

Ship a code editor on your doc viewer. Users write transforms, run them sandboxed — like CodePen for PDFs.

import MonacoEditor from "@monaco-editor/react";
import { useState } from "react";

const STARTER_CODE = `const docs = await DOCS.list();
return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));`;

function TransformEditor({ docIds }: { docIds: string[] }) {
  const [code, setCode] = useState(STARTER_CODE);
  const [result, setResult] = useState<any>(null);
  const [error, setError] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const run = async () => {
    setError(null);
    setLoading(true);
    try {
      const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ docIds, code }),
      });
      const json = await res.json();
      if (json.ok) setResult(json.result);
      else setError(json.error);
    } catch (e: any) {
      setError(e.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <MonacoEditor height="300px" language="javascript" theme="vs-dark"
        value={code} onChange={(v) => setCode(v ?? "")} />
      <button onClick={run} disabled={loading}>
        {loading ? "Running..." : "Run in Sandbox"}
      </button>
      {error && <div style={{ color: "red" }}>{error}</div>}
      {result && <pre>{JSON.stringify(result, null, 2)}</pre>}
    </div>
  );
}

Debugging

console.log() output from your sandbox code is captured and returned in the logs field:

{
  "ok": true,
  "result": { "totalRevenue": 32765 },
  "logs": ["fetching doc-abc...", "found revenue: 32,765"]
}

If your code throws, the error message is returned in the error field.

Local testing

To iterate on your transform without hitting the API, mock the DOCS object locally:

// mock-docs.mjs — run with: node mock-docs.mjs
const DOCS = {
  list: async () => [
    { id: "doc-1", file_name: "3M_2018_10K.pdf", total_pages: 160, status: "complete" },
    { id: "doc-2", file_name: "AAPL_2023_10K.pdf", total_pages: 200, status: "complete" },
  ],
  querySql: async (id, sql) => ({
    rows: [{ value: "Purchases of property, plant and equipment (PP&E) (1,577)" }],
    count: 1,
  }),
  getNodes: async (id) => JSON.stringify({
    nodes: [{ value: "Net sales $ 32,765", type: "text", page_number: 1 }],
    total: 1,
  }),
  getMarkdown: async (id) => "Net sales $ 32,765 million for the year ended...",
  getStatus: async (id) => JSON.stringify({ phase: "complete", file_name: "test.pdf" }),
};

// ---- paste your sandbox code below this line ----
const docs = await DOCS.list();
const capex = [];
for (const doc of docs) {
  const result = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  capex.push({ doc: doc.file_name, row: result.rows[0]?.value ?? null });
}
console.log({ capex, docCount: docs.length });

Once your logic works locally, paste the code body (everything below the ---- line) into the code field of the API request.

Calling from the browser

The /v1/sandbox/run endpoint supports CORS (Access-Control-Allow-Origin: *) for both modes, so browser calls work. However:

Never pass apiKey from browser-side JavaScript — it exposes your credentials to anyone who inspects network traffic. For authenticated mode, proxy through your own backend. The docIds mode is safe to call directly from the client since it requires no credentials.

Security model

Sandbox Transforms use Cloudflare Dynamic Workers with @cloudflare/codemode for secure RPC dispatch.

No global identifiers. The sandbox can only access DOCS — which routes calls back to the parent worker via RPC.
globalOutbound: null. All fetch() and connect() calls are blocked.
Read-only RPC. DOCS exposes only read methods. Write operations don’t exist.
Doc scoping. In public mode, docIds are checked on every call — you can’t enumerate or guess other documents.
Fresh isolate. Each run call gets a new V8 isolate. No state persists between executions.
No self-loop. DOCS reads from Durable Objects and R2 directly via RPC — no HTTP fetch back to the same worker.

Pricing

Sandbox Transforms are available on all OkraPDF paid plans. You do not need a separate Cloudflare account — OkraPDF manages the infrastructure. Included executions and CPU limits vary by plan — see the pricing page for your plan’s allowances. Overage rates:

Dimension	Overage rate
Executions	$0.06 per 1,000
CPU time	$0.02 per 1M ms

During the current open beta (started March 2026), execution fees are waived — you only pay for CPU time that exceeds your plan’s included allowance.

Documentation Index

​What is a Sandbox Transform?

SDK / API

Sandbox Transform

​Quick start

​Request body

​Response

​Error responses

​docIds behavior

​Minimal example

​The DOCS object

​querySql notes

​Types

​Limits

​Performance

​What the sandbox blocks

​SDK vs Sandbox

​Use cases

​1. Exact retrieval with querySql

​2. Cross-document reduce

​3. User-defined filters

​4. LLM tool calls as code

​5. Embeddable code playground

​Debugging

​Local testing

​Calling from the browser

​Security model

​Pricing

What is a Sandbox Transform?

Quick start

Request body

Response

Error responses

`docIds` behavior

Minimal example

The `DOCS` object

`querySql` notes

Types

Limits

Performance

What the sandbox blocks

SDK vs Sandbox

Use cases

1. Exact retrieval with `querySql`

2. Cross-document reduce

3. User-defined filters

4. LLM tool calls as code

5. Embeddable code playground

Debugging

Local testing

Calling from the browser

Security model

Pricing