> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Output Schema

> Define extraction recipes, materialize outputs with audit trails, and serve results publicly — all without waking the Durable Object.

## What is an Output Schema?

An output schema is a self-contained, reproducible extraction recipe attached to a document. It bundles three things:

| Component  | Purpose                                 |
| ---------- | --------------------------------------- |
| **Schema** | The shape of the output (JSON Schema)   |
| **Prompt** | Extraction instructions sent to the LLM |
| **Model**  | Which LLM runs the extraction           |

Once defined, the SDK extracts data from the document, validates it against the schema, and **materializes** the result — storing it permanently alongside a full audit trail.

***

## Why Output Schemas?

**Reproducibility.** Every output records exactly what produced it: which model, what prompt, the raw LLM response before parsing. You can always trace back from a result to its source.

**Zero-cost reads.** Materialized outputs are written to R2 on creation. Public reads serve directly from R2 — the Durable Object never wakes. No compute cost on read.

**Composability.** A single document can have many output schemas: `invoice`, `receipt`, `contract_terms`, `compliance_flags`. Each is an independent extraction with its own recipe.

***

## How It Works

```
SDK extracts data
      │
      ▼
  ┌─────────────────────────┐
  │   Durable Object        │
  │   ┌───────────────────┐ │
  │   │ output_profiles   │ │  ← recipe (schema + prompt + model)
  │   │ materialized_data │ │  ← result + audit trail
  │   └───────────────────┘ │
  │           │             │
  │     write to R2         │
  └─────────────────────────┘
              │
              ▼
  ┌─────────────────────────┐
  │   R2 (data only)        │  ← public reads, no DO wake
  │   /o_invoice/data.json  │
  └─────────────────────────┘
```

***

## Use Cases

### Invoice Processing

Extract vendor, total, line items, and dates from uploaded invoices. Attach the output schema once, then every invoice in the collection gets the same extraction recipe applied.

```ts theme={null}
const profile = {
  schema: {
    type: 'object',
    properties: {
      vendor: { type: 'string' },
      invoice_number: { type: 'string' },
      date: { type: 'string', format: 'date' },
      total: { type: 'number' },
      currency: { type: 'string' },
      line_items: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            description: { type: 'string' },
            quantity: { type: 'number' },
            unit_price: { type: 'number' },
            amount: { type: 'number' },
          },
        },
      },
    },
  },
  prompt: 'Extract all invoice fields. For line items, include description, quantity, unit price, and line total.',
  model: 'claude-sonnet-4-5-20250929',
};
```

After materialization, any system can read the structured invoice data at:

```
GET /v1/documents/{id}/o_invoice/data.json
```

No API key needed. No server wake. Just JSON.

### Compliance Screening

Flag regulatory risks in financial filings. The schema defines the flags, the prompt instructs what to look for, the model does the analysis.

```ts theme={null}
const profile = {
  schema: {
    type: 'object',
    properties: {
      risk_level: { type: 'string', enum: ['low', 'medium', 'high', 'critical'] },
      flags: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            category: { type: 'string' },
            description: { type: 'string' },
            page: { type: 'number' },
            severity: { type: 'string' },
          },
        },
      },
      summary: { type: 'string' },
    },
  },
  prompt: 'Analyze this filing for regulatory compliance risks. Flag material weaknesses, related party transactions, going concern language, and restatement disclosures.',
  model: 'claude-sonnet-4-5-20250929',
};
```

### Contract Term Extraction

Pull key terms from legal documents for deal review dashboards.

```ts theme={null}
const profile = {
  schema: {
    type: 'object',
    properties: {
      parties: { type: 'array', items: { type: 'string' } },
      effective_date: { type: 'string', format: 'date' },
      termination_date: { type: 'string', format: 'date' },
      governing_law: { type: 'string' },
      payment_terms: { type: 'string' },
      auto_renewal: { type: 'boolean' },
      non_compete_months: { type: 'number' },
      liability_cap: { type: 'string' },
    },
  },
  prompt: 'Extract key contract terms including parties, dates, governing law, payment terms, renewal clauses, non-compete duration, and liability caps.',
  model: 'claude-sonnet-4-5-20250929',
};
```

### Financial Filing Extraction

Pull key metrics from 10-K filings for benchmarking and analysis dashboards.

```ts theme={null}
const profile = {
  schema: {
    type: 'object',
    properties: {
      company: { type: 'string' },
      fiscal_year_ended: { type: 'string' },
      income_statement: {
        type: 'object',
        properties: {
          revenue: { type: 'string' },
          cost_of_revenue: { type: 'string' },
          gross_profit: { type: 'string' },
          operating_income: { type: 'string' },
          net_income: { type: 'string' },
          eps_basic: { type: 'string' },
          eps_diluted: { type: 'string' },
        },
      },
      balance_sheet: {
        type: 'object',
        properties: {
          total_assets: { type: 'string' },
          total_liabilities: { type: 'string' },
          total_stockholders_equity: { type: 'string' },
          cash_and_equivalents: { type: 'string' },
        },
      },
      cash_flow: {
        type: 'object',
        properties: {
          operating_cash_flow: { type: 'string' },
          capital_expenditures: { type: 'string' },
          free_cash_flow: { type: 'string' },
        },
      },
    },
  },
  prompt: 'Extract key financial details from this 10-K filing including income statement, balance sheet, and cash flow metrics.',
  model: 'kimi-k2p5',
};
```

### Resume Parsing

Structure candidate data from uploaded resumes for ATS integrations.

```ts theme={null}
const profile = {
  schema: {
    type: 'object',
    properties: {
      name: { type: 'string' },
      email: { type: 'string' },
      phone: { type: 'string' },
      skills: { type: 'array', items: { type: 'string' } },
      experience: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            company: { type: 'string' },
            title: { type: 'string' },
            start_date: { type: 'string' },
            end_date: { type: 'string' },
          },
        },
      },
      education: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            institution: { type: 'string' },
            degree: { type: 'string' },
            year: { type: 'number' },
          },
        },
      },
    },
  },
  prompt: 'Extract structured candidate information from this resume.',
  model: 'claude-sonnet-4-5-20250929',
};
```

***

## Audit Trail

Every materialized output stores a full audit record alongside the data:

| Field          | Description                            |
| -------------- | -------------------------------------- |
| `model`        | The model that ran the extraction      |
| `prompt`       | The exact prompt that was sent         |
| `raw_response` | The raw LLM output before JSON parsing |
| `created_at`   | Timestamp of materialization           |

Access the audit trail at:

```
GET /document/{id}/output/{name}/audit
```

This is authenticated and never exposed publicly — the public R2 path only serves the validated data.

***

## Public URL Pattern

Materialized outputs are available at a predictable, cacheable URL:

```
GET /v1/documents/{id}/o_{name}/data.json
```

The `o_` prefix tells the worker to read from R2 directly. The Durable Object never wakes.

Combine with the `t_` transform prefix for provider-specific extractions:

```
GET /v1/documents/{id}/t_llamaparse/o_invoice/data.json
```

Response headers include `Cache-Control: public, max-age=3600` and `Access-Control-Allow-Origin: *` for easy embedding.

The `t_` and `o_` URL segments are inspired by [Cloudinary](https://cloudinary.com)'s URL-as-API pattern — encode transforms in the path so results are cacheable, embeddable, and readable without an SDK.
