> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Structured Extraction

> Extract typed data from PDFs using session.prompt + Zod or JSON Schema.

## Overview

Use `session.prompt(..., { schema })` to extract typed structured data from a document.

## Basic example

```ts theme={null}
import { createOkra } from '@okrapdf/runtime';
import { z } from 'zod';

const okra = createOkra({ apiKey: process.env.OKRA_API_KEY });
const session = okra.sessions.from('ocr_doc_id');

const InvoiceSchema = z.object({
  vendor: z.string(),
  invoiceNumber: z.string(),
  date: z.string(),
  total: z.number(),
  lineItems: z.array(z.object({
    description: z.string(),
    quantity: z.number().optional(),
    amount: z.number(),
  })),
});

const { data, meta } = await session.prompt(
  'Extract all invoice fields including line items',
  { schema: InvoiceSchema },
);

console.log(data?.vendor, data?.total, meta?.confidence);
```

## JSON Schema example

```ts theme={null}
const result = await session.prompt('Extract invoice fields', {
  schema: {
    type: 'object',
    properties: {
      vendor: { type: 'string' },
      total: { type: 'number' },
    },
    required: ['vendor', 'total'],
  },
});
```

## Multi-document pattern

Run extraction across many docs by attaching sessions and using `Promise.all`:

```ts theme={null}
const sessions = ['ocr_a', 'ocr_b', 'ocr_c'].map((id) => okra.sessions.from(id));

const results = await Promise.all(
  sessions.map((s) => s.prompt('Extract invoice fields', { schema: InvoiceSchema })),
);
```

## curl example

Use the OpenAI-compatible `/chat/completions` endpoint with `response_format`:

```bash theme={null}
curl -X POST https://api.okrapdf.com/v1/documents/doc-abc123/chat/completions \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Extract revenue and net income"}],
    "response_format": {
      "type": "json_schema",
      "json_schema": {
        "name": "financials",
        "schema": {
          "type": "object",
          "properties": {
            "revenue": {"type": "string"},
            "net_income": {"type": "string"}
          }
        }
      }
    }
  }'
```

The response wraps the extracted JSON in the standard OpenAI `choices[0].message.content` format.

## MCP tool call

When using the MCP server, the agent calls `extract_data` directly:

```json theme={null}
{
  "document_id": "doc-abc123",
  "prompt": "Extract revenue and net income from this 10-K",
  "json_schema": {
    "type": "object",
    "properties": {
      "revenue": { "type": "string" },
      "net_income": { "type": "string" }
    }
  }
}
```

## Error handling

```ts theme={null}
import { StructuredOutputError } from '@okrapdf/runtime';

try {
  await session.prompt('Extract invoice fields', { schema: InvoiceSchema });
} catch (err) {
  if (err instanceof StructuredOutputError) {
    console.error(err.code, err.message, err.details);
  }
}
```

### Structured output error codes

| Code                       | Status | Meaning                                                                     |
| -------------------------- | ------ | --------------------------------------------------------------------------- |
| `SCHEMA_VALIDATION_FAILED` | 422    | Output didn't match your schema. Check field types and required fields.     |
| `EXTRACTION_BLOCKED`       | 422    | Document has no usable data (no pages, parsing failed).                     |
| `TIMEOUT`                  | 504    | Extraction exceeded time limit. Try a simpler schema or smaller page range. |
| `DOCUMENT_NOT_FOUND`       | 404    | Document ID doesn't exist or hasn't been uploaded yet.                      |
