Skip to main content

Why

Most coding agents follow the same loop:
  1. define tools
  2. model picks tools
  3. execute tools
  4. return tool results
  5. repeat
With OkraPDF, each processed document can be exposed as a tool by mapping tool names to session.prompt(...).

Core pattern

import { createOkra } from '@okrapdf/runtime';

const okra = createOkra({ apiKey: process.env.OKRA_API_KEY! });

const docs = [
  { id: 'ocr-abc123', label: 'NVIDIA 10-K' },
  { id: 'ocr-def456', label: 'AMD 10-K' },
  { id: 'ocr-ghi789', label: 'Intel 10-K' },
];

const sessions = Object.fromEntries(
  docs.map((d, i) => [`query_doc_${i}`, okra.sessions.from(d.id)]),
);

async function executeDocTool(name: string, question: string): Promise<string> {
  const session = sessions[name];
  if (!session) throw new Error(`Unknown doc tool: ${name}`);
  const { answer } = await session.prompt(question);
  return answer;
}

Claude tool loop (excerpt)

const toolCalls = res.content.filter((b) => b.type === 'tool_use');

const results = await Promise.all(
  toolCalls.map(async (tc) => ({
    type: 'tool_result' as const,
    tool_use_id: tc.id,
    content: await executeDocTool(
      tc.name,
      (tc.input as { question: string }).question,
    ),
  })),
);

OpenAI tool loop (excerpt)

for (const tc of msg.tool_calls ?? []) {
  const input = JSON.parse(tc.function.arguments) as { question: string };
  const session = sessions[tc.function.name];
  const { answer } = await session.prompt(input.question);
  messages.push({
    role: 'tool',
    tool_call_id: tc.id,
    content: answer,
  });
}

Multi-document fan-out

If your orchestration logic already knows which docs to hit, run in parallel:
const prompts = [
  okra.sessions.from('ocr_a').prompt('What was revenue and YoY growth?'),
  okra.sessions.from('ocr_b').prompt('What was revenue and YoY growth?'),
  okra.sessions.from('ocr_c').prompt('What was revenue and YoY growth?'),
];

const results = await Promise.all(prompts);

When to use this pattern

  • You want custom prompts and full agent control in your app runtime.
  • You want to combine document tools with non-document tools (web search, calculators, DB lookups).
  • You want transparent orchestration instead of a managed server-side multi-doc agent.