> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Output Schema

> Step-by-step: register a profile, materialize extraction data, and read it publicly from R2.

## Overview

This walkthrough shows the full output schema lifecycle using `curl`. Five steps: register the recipe, write the result, read from the DO, inspect the audit trail, and read publicly from R2.

<Info>
  All write operations require the `x-document-agent-secret` header. Public R2 reads require no authentication.
</Info>

## Step 1: Register an Output Profile

Define the extraction recipe — what to extract, how to extract it, which model to use.

```bash theme={null}
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output-profile/invoice" \
  -H "x-document-agent-secret: $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "schema": {
      "type": "object",
      "properties": {
        "vendor": { "type": "string" },
        "total": { "type": "number" },
        "date": { "type": "string", "format": "date" },
        "line_items": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "description": { "type": "string" },
              "quantity": { "type": "number" },
              "amount": { "type": "number" }
            }
          }
        }
      }
    },
    "prompt": "Extract invoice fields including vendor, date, total, and line items.",
    "model": "claude-sonnet-4-5-20250929"
  }'
```

```json Response theme={null}
{ "ok": true }
```

The profile is stored in the document's Durable Object SQLite database. It's the recipe — no extraction runs yet.

## Step 2: Materialize the Output

After your SDK or agent runs the extraction against the LLM, write the validated result and audit trail.

```bash theme={null}
curl -X PUT "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
  -H "x-document-agent-secret: $SECRET" \
  -H "Content-Type: application/json" \
  -d '{
    "data": {
      "vendor": "Acme Corp",
      "total": 1234.56,
      "date": "2026-01-15",
      "line_items": [
        { "description": "Widget", "quantity": 10, "amount": 1234.56 }
      ]
    },
    "audit": {
      "model": "claude-sonnet-4-5-20250929",
      "prompt": "Extract invoice fields including vendor, date, total, and line items.",
      "raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,\"date\":\"2026-01-15\",\"line_items\":[{\"description\":\"Widget\",\"quantity\":10,\"amount\":1234.56}]}"
    }
  }'
```

```json Response theme={null}
{ "ok": true }
```

This does two things:

1. **UPSERT into DO SQLite** — source of truth with full audit trail
2. **PUT to R2** — `public/{doc_id}/o_invoice.json` with data only (no audit in public blob)

## Step 3: Read via Durable Object

Authenticated read from the DO's SQLite. Returns the validated data.

```bash theme={null}
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice" \
  -H "x-document-agent-secret: $SECRET"
```

```json Response theme={null}
{
  "vendor": "Acme Corp",
  "total": 1234.56,
  "date": "2026-01-15",
  "line_items": [
    { "description": "Widget", "quantity": 10, "amount": 1234.56 }
  ]
}
```

## Step 4: Inspect the Audit Trail

See exactly what produced the output: which model, what prompt, the raw LLM response before parsing.

```bash theme={null}
curl "https://api.okrapdf.com/document/{doc_id}/output/invoice/audit" \
  -H "x-document-agent-secret: $SECRET"
```

```json Response theme={null}
{
  "model": "claude-sonnet-4-5-20250929",
  "prompt": "Extract invoice fields including vendor, date, total, and line items.",
  "raw_response": "{\"vendor\":\"Acme Corp\",\"total\":1234.56,...}",
  "created_at": 1772173501590
}
```

<Warning>
  The audit trail is only available via the authenticated DO path. It is never exposed in the public R2 blob.
</Warning>

## Step 5: Public R2 Read

The key benefit. No API key. No Durable Object wake. Served straight from R2 with cache headers.

```bash theme={null}
curl "https://api.okrapdf.com/v1/documents/{doc_id}/o_invoice/data.json"
```

```json Response theme={null}
{
  "vendor": "Acme Corp",
  "total": 1234.56,
  "date": "2026-01-15",
  "line_items": [
    { "description": "Widget", "quantity": 10, "amount": 1234.56 }
  ]
}
```

Response headers:

* `Cache-Control: public, max-age=3600`
* `Access-Control-Allow-Origin: *`
* `Content-Type: application/json`

Embed this URL in dashboards, spreadsheets, or downstream pipelines. It's a static JSON file.

## Combining with Transforms

The `o_` prefix works alongside the existing `t_` transform prefix:

```bash theme={null}
# Extract via LlamaParse provider, read invoice output
curl "https://api.okrapdf.com/v1/documents/{doc_id}/t_llamaparse/o_invoice/data.json"
```

## Multiple Output Schemas

A single document can have many output schemas. Each is independent.

```bash theme={null}
# Register and materialize different extractions
PUT /document/{id}/output-profile/invoice
PUT /document/{id}/output-profile/compliance
PUT /document/{id}/output-profile/summary

# Read each independently
GET /v1/documents/{id}/o_invoice/data.json
GET /v1/documents/{id}/o_compliance/data.json
GET /v1/documents/{id}/o_summary/data.json
```

## Upsert Behavior

Both profile registration and output materialization use upsert semantics. Re-running an extraction with updated data overwrites the previous result:

```bash theme={null}
# First extraction
PUT /document/{id}/output/invoice  →  { "vendor": "Acme Corp", "total": 1000 }

# Re-extraction with corrected data
PUT /document/{id}/output/invoice  →  { "vendor": "Acme Corp", "total": 1234.56 }

# GET returns the latest
GET /document/{id}/output/invoice  →  { "vendor": "Acme Corp", "total": 1234.56 }
```

The R2 blob is also updated, so public reads always serve the latest materialization.
