Documentation Index
Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Use session.prompt(..., { schema }) to extract typed structured data from a document.
Basic example
import { createOkra } from '@okrapdf/runtime';
import { z } from 'zod';
const okra = createOkra({ apiKey: process.env.OKRA_API_KEY });
const session = okra.sessions.from('ocr_doc_id');
const InvoiceSchema = z.object({
vendor: z.string(),
invoiceNumber: z.string(),
date: z.string(),
total: z.number(),
lineItems: z.array(z.object({
description: z.string(),
quantity: z.number().optional(),
amount: z.number(),
})),
});
const { data, meta } = await session.prompt(
'Extract all invoice fields including line items',
{ schema: InvoiceSchema },
);
console.log(data?.vendor, data?.total, meta?.confidence);
JSON Schema example
const result = await session.prompt('Extract invoice fields', {
schema: {
type: 'object',
properties: {
vendor: { type: 'string' },
total: { type: 'number' },
},
required: ['vendor', 'total'],
},
});
Multi-document pattern
Run extraction across many docs by attaching sessions and using Promise.all:
const sessions = ['ocr_a', 'ocr_b', 'ocr_c'].map((id) => okra.sessions.from(id));
const results = await Promise.all(
sessions.map((s) => s.prompt('Extract invoice fields', { schema: InvoiceSchema })),
);
curl example
Use the OpenAI-compatible /chat/completions endpoint with response_format:
curl -X POST https://api.okrapdf.com/v1/documents/doc-abc123/chat/completions \
-H "Authorization: Bearer $OKRA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Extract revenue and net income"}],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "financials",
"schema": {
"type": "object",
"properties": {
"revenue": {"type": "string"},
"net_income": {"type": "string"}
}
}
}
}
}'
The response wraps the extracted JSON in the standard OpenAI choices[0].message.content format.
When using the MCP server, the agent calls extract_data directly:
{
"document_id": "doc-abc123",
"prompt": "Extract revenue and net income from this 10-K",
"json_schema": {
"type": "object",
"properties": {
"revenue": { "type": "string" },
"net_income": { "type": "string" }
}
}
}
Error handling
import { StructuredOutputError } from '@okrapdf/runtime';
try {
await session.prompt('Extract invoice fields', { schema: InvoiceSchema });
} catch (err) {
if (err instanceof StructuredOutputError) {
console.error(err.code, err.message, err.details);
}
}
Structured output error codes
| Code | Status | Meaning |
|---|
SCHEMA_VALIDATION_FAILED | 422 | Output didn’t match your schema. Check field types and required fields. |
EXTRACTION_BLOCKED | 422 | Document has no usable data (no pages, parsing failed). |
TIMEOUT | 504 | Extraction exceeded time limit. Try a simpler schema or smaller page range. |
DOCUMENT_NOT_FOUND | 404 | Document ID doesn’t exist or hasn’t been uploaded yet. |