> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Sandbox Transforms

> Run custom JavaScript against your documents in a secure V8 sandbox. No network access, no source mutation, read-only by design.

## What is a Sandbox Transform?

A Sandbox Transform lets you push JavaScript to OkraPDF that runs on the edge in an isolated V8 sandbox. Your code gets a read-only `DOCS` object to query documents — but cannot modify them, access the network, or escape the sandbox.

Your code is an **async function body**. Write it like the inside of an `async function` — use `await`, `return` a result, and call `DOCS` methods to read your documents. No imports, no class boilerplate.

<Info>
  **Open Beta** — Sandbox Transforms are in open beta. The API surface is stable but
  may receive non-breaking additions. Breaking changes will be announced with 30 days notice.
</Info>

<CardGroup cols={2}>
  <Card title="SDK / API" icon="terminal">
    Pull data to your environment, transform locally.
    Best when **you** write and trust the code.
  </Card>

  <Card title="Sandbox Transform" icon="shield-halved">
    Push code to OkraPDF, run on the edge.
    Best when **your users or an LLM** writes the code.
  </Card>
</CardGroup>

***

## Quick start

```bash theme={null}
POST /v1/sandbox/run
```

### Request body

| Field    | Type      | Required | Description                                                                                                  |
| -------- | --------- | -------- | ------------------------------------------------------------------------------------------------------------ |
| `code`   | string    | Yes      | JavaScript async function body. Has access to `DOCS` global. Must `return` a result.                         |
| `apiKey` | string    | One of   | Your OkraPDF API key. `DOCS.list()` returns all your docs.                                                   |
| `docIds` | string\[] | One of   | Specific document IDs to scope access. No auth needed for [published documents](/guides/publishing). Max 20. |

You must provide either `apiKey` or `docIds` (not both). Need an API key? [Create one here](/guides/api-keys).

### Response

**Success** (200):

```json theme={null}
{
  "ok": true,
  "result": [{ "name": "3M_2018_10K.pdf", "pages": 160 }],
  "logs": ["processed 2 docs"]
}
```

`result` contains whatever your code returned (any JSON-serializable value, max 10 MB). `logs` contains `console.log()` output from your code (see [Debugging](#debugging)).

### Error responses

| Status  | When                                                                   | Example                                                         |
| ------- | ---------------------------------------------------------------------- | --------------------------------------------------------------- |
| **400** | Missing fields, empty `docIds`, or both `apiKey` and `docIds` provided | `{ "error": "provide apiKey or docIds, not both" }`             |
| **401** | Invalid or revoked API key                                             | `{ "error": "invalid api key" }`                                |
| **403** | Key lacks read access to a doc                                         | `{ "error": "forbidden: insufficient permissions" }`            |
| **422** | Code fails to parse (syntax error)                                     | `{ "ok": false, "error": "SyntaxError: Unexpected token '}'" }` |
| **429** | Rate limit exceeded                                                    | `{ "error": "rate limit exceeded" }` + `Retry-After` header     |
| **500** | Runtime error (your code threw)                                        | `{ "ok": false, "error": "TypeError: ...", "logs": [...] }`     |
| **504** | Timeout                                                                | See below                                                       |

**Timeout errors (504)** have distinct messages depending on the cause:

* CPU limit exceeded: `{ "ok": false, "error": "CPU time limit exceeded (30s)" }`
* Wall-clock timeout: `{ "ok": false, "error": "execution timed out (60s wall-clock)" }`

CPU timeout means your code used too much compute. Wall-clock timeout usually means `DOCS` I/O was slow (large documents, many sequential reads). To fix wall-clock timeouts, reduce the number of docs or read fewer pages per doc.

Parse errors (422) are caught before the sandbox starts — your code never ran. Runtime errors (500) mean the sandbox executed but your code threw.

### `docIds` behavior

In public mode, each doc ID is validated when you call a `DOCS` method:

* **Valid public doc** — returns data normally.
* **Non-existent ID** — `DOCS.getNodes("doc-invalid")` throws `"doc doc-invalid not in allowed set"`.
* **Exists but not published** — throws `"getNodes failed: 403"`.
* **Mixed valid/invalid** — each call succeeds or fails independently. The sandbox doesn't abort on a single doc failure. Use try/catch in your code to handle partial results.

In **authenticated mode** (`apiKey`), the same behavior applies — individual `DOCS` calls can fail if a doc is in `error` phase or was deleted. Always wrap `DOCS` calls in try/catch when iterating.

<Note>
  The `apiKey` field is placed in the request body (not the `Authorization` header) because sandbox
  calls are often composed programmatically — by LLMs generating tool calls, code editors submitting
  forms, or serverless functions building payloads. Body placement simplifies these integrations.
  **Never send `apiKey` from browser-side JavaScript** — use `docIds` mode for client-side calls.
</Note>

### Minimal example

Try this against real public FinanceBench docs (these IDs work — no API key needed):

<CodeGroup>
  ```javascript Node.js (save as test.mjs, run: node test.mjs) theme={null}
  // Requires Node 18+ (top-level await in ES modules)
  const code = `
    const docs = await DOCS.list();
    return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));
  `;

  const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      docIds: ["doc-fac0d10b8ebc4a2ebf13", "doc-ea6befd03a6b4a4aaf0d"],
      code,
    }),
  });

  const { ok, result, error } = await res.json();
  console.log(ok ? result : `Error: ${error}`);
  // → [{ name: "3M_2018_10K.pdf", pages: 160 }, { name: "ACTIVISIONBLIZZARD_2019_10K.pdf", pages: 198 }]
  ```

  ```python Python (save as test.py, run: python test.py — no pip install needed) theme={null}
  import json, urllib.request

  code = """
  const docs = await DOCS.list();
  return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));
  """

  body = json.dumps({
      "docIds": ["doc-fac0d10b8ebc4a2ebf13", "doc-ea6befd03a6b4a4aaf0d"],
      "code": code,
  }).encode()

  req = urllib.request.Request(
      "https://api.okrapdf.com/v1/sandbox/run",
      data=body,
      headers={"Content-Type": "application/json"},
  )
  data = json.loads(urllib.request.urlopen(req).read())
  print(data["result"] if data["ok"] else f"Error: {data['error']}")
  # → [{'name': '3M_2018_10K.pdf', 'pages': 160}, {'name': 'ACTIVISIONBLIZZARD_2019_10K.pdf', 'pages': 198}]
  ```

  ```bash cURL (copy-paste into terminal) theme={null}
  curl -s https://api.okrapdf.com/v1/sandbox/run \
    -H "Content-Type: application/json" \
    -d '{
      "docIds": ["doc-fac0d10b8ebc4a2ebf13", "doc-ea6befd03a6b4a4aaf0d"],
      "code": "const docs = await DOCS.list(); return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));"
    }' | jq .
  ```
</CodeGroup>

<Tip>
  The doc IDs above are real public FinanceBench documents — **the examples work as-is**.
  Replace them with your own doc IDs, or add `"apiKey": "okra_..."` instead of `docIds`
  to access your private documents.
</Tip>

<Tip>
  For authenticated mode, replace `docIds` with `"apiKey": "okra_..."` in the
  request body. The sandbox will have access to all documents your key can read.
</Tip>

***

## The `DOCS` object

Your sandbox code has access to a global `DOCS` object — a read-only interface to document data.

Use `querySql()` first when you want deterministic retrieval, filtering, ranking, or cross-document comparison in code. Use `getNodes()` when you need raw node payloads. Text methods (`getMarkdown`, `getPage`) return plain strings directly.

```typescript theme={null}
// Structured data → objects
DOCS.list()                               → Doc[]
DOCS.querySql(docId: string, sql: string) → { rows: object[], count: number, warning?: string, hint?: string }

// Structured payloads serialized by document endpoints
DOCS.getNodes(docId: string, page?: string) → JSON string of { nodes: Node[], total: number }
DOCS.getStatus(docId: string)               → JSON string of DocStatus

// Text data → plain strings (use directly)
DOCS.getMarkdown(docId: string)            → markdown string
DOCS.getPage(docId: string, page: string) → page markdown string
```

**Scoping:** In `docIds` mode, `DOCS.list()` returns only the documents you specified — not all public docs. In `apiKey` mode, it returns all documents your key can access (up to 100).

**Pagination:** There is currently no cursor or offset parameter. `DOCS.list()` returns a maximum of 100 documents. If you have more than 100 documents and need to process them all, use the [SDK/API](/api-reference/documents/list) to get the full list, then pass specific `docIds` to the sandbox.

### `querySql` notes

* Read-only only: `SELECT` and `WITH` queries are allowed. Mutations are blocked.
* Queryable tables: `nodes`, `nodes_fts`, `edges`, `page_ledger`, `verifications`, `meta`, `findings`.
* `nodes_fts` auto-initializes on first query for hydrated docs, so FTS works in sandbox mode without a separate warm-up step.
* For exact line retrieval, start with `nodes_fts MATCH` or `nodes.value LIKE ...`, then do the math in JavaScript.

### Types

```typescript theme={null}
interface Doc {
  id: string;
  file_name: string;
  total_pages: number;
  status: string;
  pages_completed?: number;
  inserted_at?: number | string;
  updated_at?: number | string;
  is_public?: boolean;
}

interface SqlResult {
  rows: Array<Record<string, unknown>>;
  count: number;
  warning?: string;
  hint?: string;
}

interface DocStatus {
  phase: string;             // "complete", "extracting", "error", etc.
  file_name: string;
  total_pages: number;
  pages_completed: number;
}

interface Node {
  id: string;
  type: string;        // "text", "heading", "table", "key_value", etc.
  label: string | null;
  value: string | null;
  page_number: number | null;
  confidence: number | null;
  bbox_x: number | null;
  bbox_y: number | null;
  bbox_w: number | null;
  bbox_h: number | null;
}
```

<Warning>
  No write methods exist. The sandbox cannot modify, delete, or re-upload documents.
  It cannot make network requests (`fetch` is blocked). It can only read.
</Warning>

***

## Limits

| Limit                  | Value                                               |
| ---------------------- | --------------------------------------------------- |
| CPU time per execution | 30 seconds                                          |
| Wall-clock timeout     | 60 seconds (includes I/O wait for DOCS calls)       |
| Code size              | 1 MB                                                |
| Response payload       | 10 MB                                               |
| `docIds` per request   | 20                                                  |
| Concurrent executions  | 10 per API key (public mode: 5 per IP)              |
| Rate limit             | 60 requests/min per API key (public: 20/min per IP) |
| Rate limit headers     | `Retry-After`, `X-RateLimit-Remaining` on 429       |

Wall-clock timeout includes time waiting on `DOCS` I/O (fetching nodes from storage). If a document is slow to read, that counts against your 60s budget.

Standard Workers built-ins (`URL`, `TextEncoder`, `crypto`, `JSON`, etc.) are available. Node.js APIs (`fs`, `net`, `child_process`) are not. No `import` statements — your code runs as a function body, not a module.

### Performance

| Scenario                      | Typical latency                       |
| ----------------------------- | ------------------------------------- |
| Cold start (new isolate)      | 50–200ms                              |
| Warm isolate (repeated calls) | 5–15ms                                |
| `DOCS.querySql()` per doc     | 20–150ms (exact-match / FTS patterns) |
| `DOCS.getNodes()` per doc     | 100–500ms (depends on doc size)       |
| 10-doc reduce                 | 1–5s total                            |

First request incurs isolate startup cost. The platform may cache warm isolates for repeated requests with identical code, but this is not guaranteed — design for cold starts.

***

## What the sandbox blocks

| Action                           | Result                                   |
| -------------------------------- | ---------------------------------------- |
| `fetch("https://evil.com")`      | `"not permitted to access the internet"` |
| `DOCS.getNodes("doc-not-yours")` | `"doc doc-not-yours not in allowed set"` |
| Write to any storage             | No write bindings exist                  |
| `import("node:fs")`              | Not available in Workers runtime         |

***

## SDK vs Sandbox

|                   | SDK / API                 | Sandbox Transform                           |
| ----------------- | ------------------------- | ------------------------------------------- |
| **Code author**   | You (trusted)             | Users, LLMs, agents (untrusted)             |
| **Runs where**    | Your server               | Cloudflare edge isolate                     |
| **Network**       | Full                      | Blocked                                     |
| **Mutation risk** | Possible                  | Impossible — no write methods               |
| **Dependencies**  | Full npm                  | Built-ins only                              |
| **Latency**       | N round-trips for N docs  | 1 round-trip, code runs next to data        |
| **Best for**      | Pipelines, scripts, CI/CD | Code editors, playgrounds, agent tool calls |

***

## Use cases

### 1. Exact retrieval with `querySql`

Pull the exact lines you want, then compute in code. This is usually better than scanning all nodes.

```bash theme={null}
curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docId = \"doc-fac0d10b8ebc4a2ebf13\";\nconst matches = await DOCS.querySql(docId, \"SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH '\\''property'\\'' LIMIT 5\");\nreturn matches;"
  }' | jq .result
```

Readable version:

```javascript theme={null}
const docId = "doc-fac0d10b8ebc4a2ebf13";
const matches = await DOCS.querySql(
  docId,
  "SELECT page_number, substr(value, 1, 220) AS value FROM nodes_fts WHERE nodes_fts MATCH 'property' LIMIT 5"
);
return matches;
```

### 2. Cross-document reduce

Do deterministic retrieval plus math across one or more docs.

```bash theme={null}
# Copy-paste into terminal — extracts 3M 2018 capex from the filing
curl -s https://api.okrapdf.com/v1/sandbox/run \
  -H "Content-Type: application/json" \
  -d '{
    "docIds": ["doc-fac0d10b8ebc4a2ebf13"],
    "code": "const docs = await DOCS.list();\nconst results = [];\nfor (const doc of docs) {\n  const line = await DOCS.querySql(doc.id, \"SELECT value FROM nodes WHERE value LIKE '\\''%Purchases of property, plant and equipment%'\\'' LIMIT 1\");\n  const value = line.rows[0]?.value ?? null;\n  const match = typeof value === 'string' ? value.match(/\\(([^)]+)\\)|([\\d,]+)/) : null;\n  results.push({ doc: doc.file_name, raw: value, extracted: match ? (match[1] || match[2]) : null });\n}\nreturn results;"
  }' | jq .result
```

The sandbox code (readable version):

```javascript theme={null}
const docs = await DOCS.list();
const results = [];
for (const doc of docs) {
  const line = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  const value = line.rows[0]?.value ?? null;
  const match = typeof value === "string" ? value.match(/\(([^)]+)\)|([\d,]+)/) : null;
  results.push({
    doc: doc.file_name,
    raw: value,
    extracted: match ? (match[1] || match[2]) : null,
  });
}
return results;
```

### 3. User-defined filters

Custom scoring, multi-field boolean logic, regex patterns — beyond what keyword or semantic search offers.

```javascript theme={null}
const docs = await DOCS.list();
const hits = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%material weakness%' OR value LIKE '%restatement%' OR value LIKE '%risk factor%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    let score = 0;
    if (/risk factor/gi.test(text)) score += 2;
    if (/material weakness/gi.test(text)) score += 5;
    if (/restatement/gi.test(text)) score += 10;
    if (score >= 5) hits.push({ id: doc.id, name: doc.file_name, score });
  } catch { /* skip */ }
}
return hits.sort((a, b) => b.score - a.score);
```

### 4. LLM tool calls as code

Instead of an AI agent making 10 sequential API calls, generate one JS function that does the same work in a single sandbox execution.

```javascript theme={null}
const docs = await DOCS.list();
const comparison = [];
for (const doc of docs) {
  try {
    const data = await DOCS.querySql(
      doc.id,
      "SELECT value FROM nodes WHERE value LIKE '%risk factor%' OR value LIKE '%artificial intelligence%' OR value LIKE '%machine learning%' LIMIT 200"
    );
    const text = data.rows.map((row) => row.value).filter(Boolean).join(" ");
    comparison.push({
      company: doc.file_name.replace(/_\d{4}.*/, ''),
      year: doc.file_name.match(/\d{4}/)?.[0],
      hasRiskFactors: /risk factor/i.test(text),
      mentionsAI: /artificial intelligence|machine learning/i.test(text),
      pageCount: doc.total_pages,
    });
  } catch { /* skip */ }
}
return comparison;
```

### 5. Embeddable code playground

Ship a code editor on your doc viewer. Users write transforms, run them sandboxed — like CodePen for PDFs.

```tsx theme={null}
import MonacoEditor from "@monaco-editor/react";
import { useState } from "react";

const STARTER_CODE = `const docs = await DOCS.list();
return docs.map(d => ({ name: d.file_name, pages: d.total_pages }));`;

function TransformEditor({ docIds }: { docIds: string[] }) {
  const [code, setCode] = useState(STARTER_CODE);
  const [result, setResult] = useState<any>(null);
  const [error, setError] = useState<string | null>(null);
  const [loading, setLoading] = useState(false);

  const run = async () => {
    setError(null);
    setLoading(true);
    try {
      const res = await fetch("https://api.okrapdf.com/v1/sandbox/run", {
        method: "POST",
        headers: { "Content-Type": "application/json" },
        body: JSON.stringify({ docIds, code }),
      });
      const json = await res.json();
      if (json.ok) setResult(json.result);
      else setError(json.error);
    } catch (e: any) {
      setError(e.message);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div>
      <MonacoEditor height="300px" language="javascript" theme="vs-dark"
        value={code} onChange={(v) => setCode(v ?? "")} />
      <button onClick={run} disabled={loading}>
        {loading ? "Running..." : "Run in Sandbox"}
      </button>
      {error && <div style={{ color: "red" }}>{error}</div>}
      {result && <pre>{JSON.stringify(result, null, 2)}</pre>}
    </div>
  );
}
```

***

## Debugging

`console.log()` output from your sandbox code is captured and returned in the `logs` field:

```json theme={null}
{
  "ok": true,
  "result": { "totalRevenue": 32765 },
  "logs": ["fetching doc-abc...", "found revenue: 32,765"]
}
```

If your code throws, the error message is returned in the `error` field.

### Local testing

To iterate on your transform without hitting the API, mock the `DOCS` object locally:

```javascript theme={null}
// mock-docs.mjs — run with: node mock-docs.mjs
const DOCS = {
  list: async () => [
    { id: "doc-1", file_name: "3M_2018_10K.pdf", total_pages: 160, status: "complete" },
    { id: "doc-2", file_name: "AAPL_2023_10K.pdf", total_pages: 200, status: "complete" },
  ],
  querySql: async (id, sql) => ({
    rows: [{ value: "Purchases of property, plant and equipment (PP&E) (1,577)" }],
    count: 1,
  }),
  getNodes: async (id) => JSON.stringify({
    nodes: [{ value: "Net sales $ 32,765", type: "text", page_number: 1 }],
    total: 1,
  }),
  getMarkdown: async (id) => "Net sales $ 32,765 million for the year ended...",
  getStatus: async (id) => JSON.stringify({ phase: "complete", file_name: "test.pdf" }),
};

// ---- paste your sandbox code below this line ----
const docs = await DOCS.list();
const capex = [];
for (const doc of docs) {
  const result = await DOCS.querySql(
    doc.id,
    "SELECT value FROM nodes WHERE value LIKE '%Purchases of property, plant and equipment%' LIMIT 1"
  );
  capex.push({ doc: doc.file_name, row: result.rows[0]?.value ?? null });
}
console.log({ capex, docCount: docs.length });
```

Once your logic works locally, paste the code body (everything below the `----` line) into the `code` field of the API request.

### Calling from the browser

The `/v1/sandbox/run` endpoint supports CORS (`Access-Control-Allow-Origin: *`) for both modes, so browser calls work. However:

<Warning>
  Never pass `apiKey` from browser-side JavaScript — it exposes your credentials to anyone
  who inspects network traffic. For authenticated mode, proxy through your own backend.
  The `docIds` mode is safe to call directly from the client since it requires no credentials.
</Warning>

***

## Security model

Sandbox Transforms use [Cloudflare Dynamic Workers](https://developers.cloudflare.com/dynamic-workers/) with [`@cloudflare/codemode`](https://developers.cloudflare.com/dynamic-workers/getting-started/) for secure RPC dispatch.

* **No global identifiers.** The sandbox can only access `DOCS` — which routes calls back to the parent worker via RPC.
* **`globalOutbound: null`.** All `fetch()` and `connect()` calls are blocked.
* **Read-only RPC.** `DOCS` exposes only read methods. Write operations don't exist.
* **Doc scoping.** In public mode, `docIds` are checked on every call — you can't enumerate or guess other documents.
* **Fresh isolate.** Each `run` call gets a new V8 isolate. No state persists between executions.
* **No self-loop.** `DOCS` reads from Durable Objects and R2 directly via RPC — no HTTP fetch back to the same worker.

***

## Pricing

Sandbox Transforms are available on all OkraPDF paid plans. You do not need a separate Cloudflare account — OkraPDF manages the infrastructure.

Included executions and CPU limits vary by plan — see the [pricing page](/pricing) for your plan's allowances. Overage rates:

| Dimension  | Overage rate     |
| ---------- | ---------------- |
| Executions | \$0.06 per 1,000 |
| CPU time   | \$0.02 per 1M ms |

During the current open beta (started March 2026), **execution fees are waived** — you only pay for CPU time that exceeds your plan's included allowance.
