Skip to main content

What is a Sandbox Transform?

A Sandbox Transform lets you push JavaScript to OkraPDF that runs on the edge in an isolated V8 sandbox. Your code gets a read-only DOCS object to query documents — but cannot modify them, access the network, or escape the sandbox. Your code is an async function body. Write it like the inside of an async function — use await, return a result, and call DOCS methods to read your documents. No imports, no class boilerplate.
Open Beta — Sandbox Transforms are in open beta. The API surface is stable but may receive non-breaking additions. Breaking changes will be announced with 30 days notice.

SDK / API

Pull data to your environment, transform locally. Best when you write and trust the code.

Sandbox Transform

Push code to OkraPDF, run on the edge. Best when your users or an LLM writes the code.

Quick start

POST /v1/sandbox/run

Request body

FieldTypeRequiredDescription
codestringYesJavaScript async function body. Has access to DOCS global. Must return a result.
apiKeystringOne ofYour OkraPDF API key. DOCS.list() returns all your docs.
docIdsstring[]One ofSpecific document IDs to scope access. No auth needed for published documents. Max 20.
You must provide either apiKey or docIds (not both). Need an API key? Create one here.

Response

Success (200):
{
  "ok": true,
  "result": [{ "name": "3M_2018_10K.pdf", "pages": 160 }],
  "logs": ["processed 2 docs"]
}
result contains whatever your code returned (any JSON-serializable value, max 10 MB). logs contains console.log() output from your code (see Debugging).

Error responses

StatusWhenExample
400Missing fields, empty docIds, or both apiKey and docIds provided{ "error": "provide apiKey or docIds, not both" }
401Invalid or revoked API key{ "error": "invalid api key" }
403Key lacks read access to a doc{ "error": "forbidden: insufficient permissions" }
422Code fails to parse (syntax error){ "ok": false, "error": "SyntaxError: Unexpected token '}'" }
429Rate limit exceeded{ "error": "rate limit exceeded" } + Retry-After header
500Runtime error (your code threw){ "ok": false, "error": "TypeError: ...", "logs": [...] }
504TimeoutSee below
Timeout errors (504) have distinct messages depending on the cause:
  • CPU limit exceeded: { "ok": false, "error": "CPU time limit exceeded (30s)" }
  • Wall-clock timeout: { "ok": false, "error": "execution timed out (60s wall-clock)" }
CPU timeout means your code used too much compute. Wall-clock timeout usually means DOCS I/O was slow (large documents, many sequential reads). To fix wall-clock timeouts, reduce the number of docs or read fewer pages per doc. Parse errors (422) are caught before the sandbox starts — your code never ran. Runtime errors (500) mean the sandbox executed but your code threw.

docIds behavior

In public mode, each doc ID is validated when you call a DOCS method:
  • Valid public doc — returns data normally.
  • Non-existent IDDOCS.getNodes("doc-invalid") throws "doc doc-invalid not in allowed set".
  • Exists but not published — throws "getNodes failed: 403".
  • Mixed valid/invalid — each call succeeds or fails independently. The sandbox doesn’t abort on a single doc failure. Use try/catch in your code to handle partial results.
In authenticated mode (apiKey), the same behavior applies — individual DOCS calls can fail if a doc is in error phase or was deleted. Always wrap DOCS calls in try/catch when iterating.
The apiKey field is placed in the request body (not the Authorization header) because sandbox calls are often composed programmatically — by LLMs generating tool calls, code editors submitting forms, or serverless functions building payloads. Body placement simplifies these integrations. Never send apiKey from browser-side JavaScript — use docIds mode for client-side calls.

Minimal example

Try this against real public FinanceBench docs (these IDs work — no API key needed):
// Requires Node 18+ (top-level await in ES modules)
const code = `
  const docs = await DOCS.list();
  return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));
`;

const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    docIds: ["doc-fac0d10b8ebc4a2ebf13", "doc-ea6befd03a6b4a4aaf0d"],
    code,
  }),
});

const { ok, result, error } = await res.json();
console.log(ok ? result : `Error: ${error}`);
// → [{ name: "3M_2018_10K.pdf", pages: 160 }, { name: "ACTIVISIONBLIZZARD_2019_10K.pdf", pages: 198 }]
The doc IDs above are real public FinanceBench documents — the examples work as-is. Replace them with your own doc IDs, or add "apiKey": "okra_..." instead of docIds to access your private documents.
For authenticated mode, replace docIds with "apiKey": "okra_..." in the request body. The sandbox will have access to all documents your key can read.

The DOCS object

Your sandbox code has access to a global DOCS object — a read-only interface to document data. Use querySql() first when you want deterministic retrieval, filtering, ranking, or cross-document comparison in code. Use getNodes() when you need raw node payloads. Text methods (getMarkdown, getPage) return plain strings directly.
// Structured data → objects
DOCS.list()                               → Doc[]
DOCS.querySql(docId: string, sql: string) → { rows: object[], count: number, warning?: string, hint?: string }

// Structured payloads serialized by document endpoints
DOCS.getNodes(docId: string, page?: string) → JSON string of { nodes: Node[], total: number }
DOCS.getStatus(docId: string)               → JSON string of DocStatus

// Text data → plain strings (use directly)
DOCS.getMarkdown(docId: string)            → markdown string
DOCS.getPage(docId: string, page: string) → page markdown string
Scoping: In docIds mode, DOCS.list() returns only the documents you specified — not all public docs. In apiKey mode, it returns all documents your key can access (up to 100). Pagination: There is currently no cursor or offset parameter. DOCS.list() returns a maximum of 100 documents. If you have more than 100 documents and need to process them all, use the SDK/API to get the full list, then pass specific docIds to the sandbox.

querySql notes

  • Read-only only: SELECT and WITH queries are allowed. Mutations are blocked.
  • Queryable tables: nodes, nodes_fts, edges, page_ledger, verifications, meta, findings.
  • nodes_fts auto-initializes on first query for hydrated docs, so FTS works in sandbox mode without a separate warm-up step.
  • For exact line retrieval, start with nodes_fts MATCH or nodes.value LIKE ..., then do the math in JavaScript.

Types

interface Doc {
  id: string;
  file_name: string;
  total_pages: number;
  status: string;
  pages_completed?: number;
  inserted_at?: number | string;
  updated_at?: number | string;
  is_public?: boolean;
}

interface SqlResult {
  rows: Array<Record<string, unknown>>;
  count: number;
  warning?: string;
  hint?: string;
}

interface DocStatus {
  phase: string;             // "complete", "extracting", "error", etc.
  file_name: string;
  total_pages: number;
  pages_completed: number;
}

interface Node {
  id: string;
  type: string;        // "text", "heading", "table", "key_value", etc.
  label: string | null;
  value: string | null;
  page_number: number | null;
  confidence: number | null;
  bbox_x: number | null;
  bbox_y: number | null;
  bbox_w: number | null;
  bbox_h: number | null;
}
No write methods exist. The sandbox cannot modify, delete, or re-upload documents. It cannot make network requests (fetch is blocked). It can only read.

Limits

LimitValue
CPU time per execution30 seconds
Wall-clock timeout60 seconds (includes I/O wait for DOCS calls)
Code size1 MB
Response payload10 MB
docIds per request20
Concurrent executions10 per API key (public mode: 5 per IP)
Rate limit60 requests/min per API key (public: 20/min per IP)
Rate limit headersRetry-After, X-RateLimit-Remaining on 429
Wall-clock timeout includes time waiting on DOCS I/O (fetching nodes from storage). If a document is slow to read, that counts against your 60s budget. Standard Workers built-ins (URL, TextEncoder, crypto, JSON, etc.) are available. Node.js APIs (fs, net, child_process) are not. No import statements — your code runs as a function body, not a module.

Performance

ScenarioTypical latency
Cold start (new isolate)50–200ms
Warm isolate (repeated calls)5–15ms
DOCS.querySql() per doc20–150ms (exact-match / FTS patterns)
DOCS.getNodes() per doc100–500ms (depends on doc size)
10-doc reduce1–5s total
First request incurs isolate startup cost. The platform may cache warm isolates for repeated requests with identical code, but this is not guaranteed — design for cold starts.

What the sandbox blocks

ActionResult
fetch("https://evil.com")"not permitted to access the internet"
DOCS.getNodes("doc-not-yours")"doc doc-not-yours not in allowed set"
Write to any storageNo write bindings exist
import("node:fs")Not available in Workers runtime

SDK vs Sandbox

SDK / APISandbox Transform
Code authorYou (trusted)Users, LLMs, agents (untrusted)
Runs whereYour serverCloudflare edge isolate
NetworkFullBlocked
Mutation riskPossibleImpossible — no write methods
DependenciesFull npmBuilt-ins only
LatencyN round-trips for N docs1 round-trip, code runs next to data
Best forPipelines, scripts, CI/CDCode editors, playgrounds, agent tool calls

Use cases

1. Exact retrieval with querySql

Pull the exact lines you want, then compute in code. This is usually better than scanning all nodes.
curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docId = \"doc-fac0d10b8ebc4a2ebf13\";\nconst matches = await DOCS.querySql(docId, \"SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH '\\''property'\\'' LIMIT 5\");\nreturn matches;"
  }' | jq .result
Readable version:
const docId = "doc-fac0d10b8ebc4a2ebf13";
const matches = await DOCS.querySql(
  docId,
  "SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH 'property' LIMIT 5"
);
return matches;

2. Cross-document reduce

Do deterministic retrieval plus math across one or more docs.
# Copy-paste into terminal — extracts 3M 2018 capex from the filing
curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docs = await DOCS.list();\nconst results = [];\nfor (const doc of docs) {\n  const line = await DOCS.querySql(doc.id, \"SELECT value FROM nodes WHERE value LIKE '\\''%Purchases of property, plant and equipment%'\\'' LIMIT 1\");\n  const value = line.rows[0]?.value ?? null;\n  const match = typeof value === 'string' ? value.match(/\\(([^)]+)\\)|([\\d,]+)/) : null;\n  results.push({ doc: doc.file_name, raw: value, extracted: match ? (match[1] || match[2]) : null });\n}\nreturn results;"
  }' | jq .result
The sandbox code (readable version):
const docs = await DOCS.list();
const results = [];
for (const doc of docs) {
  const line = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  const value = line.rows[0]?.value ?? null;
  const match = typeof value === "string" ? value.match(/\(([^)]+)\)|([\d,]+)/) : null;
  results.push({
    doc: doc.file_name,
    raw: value,
    extracted: match ? (match[1] || match[2]) : null,
  });
}
return results;

3. User-defined filters

Custom scoring, multi-field boolean logic, regex patterns — beyond what keyword or semantic search offers.
const docs = await DOCS.list();
const hits = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%material weakness%' OR value LIKE '%restatement%' OR value LIKE '%risk factor%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    let score = 0;
    if (/risk factor/gi.test(text)) score += 2;
    if (/material weakness/gi.test(text)) score += 5;
    if (/restatement/gi.test(text)) score += 10;
    if (score >= 5) hits.push({ id: doc.id, name: doc.file_name, score });
  } catch { /* skip */ }
}
return hits.sort((a, b) => b.score - a.score);

4. LLM tool calls as code

Instead of an AI agent making 10 sequential API calls, generate one JS function that does the same work in a single sandbox execution.
const docs = await DOCS.list();
const comparison = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%risk factor%' OR value LIKE '%artificial intelligence%' OR value LIKE '%machine learning%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    comparison.push({
      company: doc.file_name.replace(/_\d{4}.*/, ''),
      year: doc.file_name.match(/\d{4}/)?.[0],
      hasRiskFactors: /risk factor/i.test(text),
      mentionsAI: /artificial intelligence|machine learning/i.test(text),
      pageCount: doc.total_pages,
    });
  } catch { /* skip */ }
}
return comparison;

5. Embeddable code playground

Ship a code editor on your doc viewer. Users write transforms, run them sandboxed — like CodePen for PDFs.
import MonacoEditor from "@monaco-editor/react";
import { useState } from "react";

const STARTER_CODE = `const docs = await DOCS.list();
return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));`;

function TransformEditor({ docIds }: { docIds: string[] }) {
  const [code, setCode] = useState(STARTER_CODE);
  const [result, setResult] = useState<any>(null);
  const [error, setError] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const run = async () => {
    setError(null);
    setLoading(true);
    try {
      const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ docIds, code }),
      });
      const json = await res.json();
      if (json.ok) setResult(json.result);
      else setError(json.error);
    } catch (e: any) {
      setError(e.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <MonacoEditor height="300px" language="javascript" theme="vs-dark"
        value={code} onChange={(v) => setCode(v ?? "")} />
      <button onClick={run} disabled={loading}>
        {loading ? "Running..." : "Run in Sandbox"}
      </button>
      {error && <div style={{ color: "red" }}>{error}</div>}
      {result && <pre>{JSON.stringify(result, null, 2)}</pre>}
    </div>
  );
}

Debugging

console.log() output from your sandbox code is captured and returned in the logs field:
{
  "ok": true,
  "result": { "totalRevenue": 32765 },
  "logs": ["fetching doc-abc...", "found revenue: 32,765"]
}
If your code throws, the error message is returned in the error field.

Local testing

To iterate on your transform without hitting the API, mock the DOCS object locally:
// mock-docs.mjs — run with: node mock-docs.mjs
const DOCS = {
  list: async () => [
    { id: "doc-1", file_name: "3M_2018_10K.pdf", total_pages: 160, status: "complete" },
    { id: "doc-2", file_name: "AAPL_2023_10K.pdf", total_pages: 200, status: "complete" },
  ],
  querySql: async (id, sql) => ({
    rows: [{ value: "Purchases of property, plant and equipment (PP&E) (1,577)" }],
    count: 1,
  }),
  getNodes: async (id) => JSON.stringify({
    nodes: [{ value: "Net sales $ 32,765", type: "text", page_number: 1 }],
    total: 1,
  }),
  getMarkdown: async (id) => "Net sales $ 32,765 million for the year ended...",
  getStatus: async (id) => JSON.stringify({ phase: "complete", file_name: "test.pdf" }),
};

// ---- paste your sandbox code below this line ----
const docs = await DOCS.list();
const capex = [];
for (const doc of docs) {
  const result = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  capex.push({ doc: doc.file_name, row: result.rows[0]?.value ?? null });
}
console.log({ capex, docCount: docs.length });
Once your logic works locally, paste the code body (everything below the ---- line) into the code field of the API request.

Calling from the browser

The /v1/sandbox/run endpoint supports CORS (Access-Control-Allow-Origin: *) for both modes, so browser calls work. However:
Never pass apiKey from browser-side JavaScript — it exposes your credentials to anyone who inspects network traffic. For authenticated mode, proxy through your own backend. The docIds mode is safe to call directly from the client since it requires no credentials.

Security model

Sandbox Transforms use Cloudflare Dynamic Workers with @cloudflare/codemode for secure RPC dispatch.
  • No global identifiers. The sandbox can only access DOCS — which routes calls back to the parent worker via RPC.
  • globalOutbound: null. All fetch() and connect() calls are blocked.
  • Read-only RPC. DOCS exposes only read methods. Write operations don’t exist.
  • Doc scoping. In public mode, docIds are checked on every call — you can’t enumerate or guess other documents.
  • Fresh isolate. Each run call gets a new V8 isolate. No state persists between executions.
  • No self-loop. DOCS reads from Durable Objects and R2 directly via RPC — no HTTP fetch back to the same worker.

Pricing

Sandbox Transforms are available on all OkraPDF paid plans. You do not need a separate Cloudflare account — OkraPDF manages the infrastructure. Included executions and CPU limits vary by plan — see the pricing page for your plan’s allowances. Overage rates:
DimensionOverage rate
Executions$0.06 per 1,000
CPU time$0.02 per 1M ms
During the current open beta (started March 2026), execution fees are waived — you only pay for CPU time that exceeds your plan’s included allowance.