Skip to main content

Overview

Every OkraPDF document can have an Eval Agent attached. When enabled, it evaluates each chat completion asynchronously — checking for hallucinated facts, compliance violations, or custom policy rules you define. The eval never blocks the response. Results appear in the document event log within seconds.
User question → Completion runs → Response sent immediately

                                        ▼ (async, via queue)
                                   EvalAgent evaluates


                                   Alerts logged

Enable eval on a document

curl -X PUT https://api.okrapdf.com/document/$DOC_ID/config/eval \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "instructions": "Flag any response that cites numbers, dates, or facts not found in the document."
  }'
Response:
{
  "document_id": "doc-abc123",
  "spec_version": 1,
  "eval": {
    "enabled": true,
    "scope": "document",
    "instructions": "Flag any response that cites numbers, dates, or facts not found in the document.",
    "maxRecentTurns": 5
  }
}

How it works

  1. You chat normallyPOST /document/:id/chat/completions returns instantly, no latency added.
  2. Three hooks fire asynchronously via the document’s internal queue:
    • turn.before — evaluates the user query before the LLM runs
    • tool.execute.after — evaluates tool call results
    • turn.after — evaluates the final response against the document
  3. EvalAgent judges using a fast LLM (Haiku or Kimi-K2.5) with your instructions as the evaluation criteria.
  4. Results logged to the document event log — viewable via API or the info page.

Check eval results

curl https://api.okrapdf.com/document/$DOC_ID/events?limit=10 \
  -H "Authorization: Bearer $OKRA_API_KEY"
Example entries:
[
  {
    "event": "log",
    "detail": {
      "message": "[EvalAgent] turn.after completed: 2 action(s)"
    }
  },
  {
    "event": "log",
    "detail": {
      "message": "[EvalAgent] info: Response correctly reports that revenue data is not in the document. No hallucinated figures detected."
    }
  }
]

Guardrail examples

Hallucination detection (financial documents)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag any response that cites dollar amounts, percentages, or financial metrics not explicitly stated in the document. Be strict — estimates or inferred values must be flagged."
}'

Source accuracy (legal/compliance)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Verify that every claim in the response has a direct source in the document. Flag any response that paraphrases in a way that changes the meaning. Flag missing citations or page references."
}'

PII leakage prevention

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag if the response contains Social Security numbers, account numbers, or personal addresses from the document. These should be redacted, not exposed in chat responses."
}'

Scope enforcement (narrow the agent)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "This document is an employee handbook. Flag any response that answers questions outside the scope of HR policies, benefits, and workplace procedures. The agent should decline off-topic questions."
}'

Tone and brand voice

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag responses that use casual language, slang, or first-person voice. All responses should be professional and written in third-person."
}'

Configuration options

FieldTypeDefaultDescription
enabledbooleanfalseTurn eval on/off
scope"document" | "user""document"Eval context scope — per-document or per-user turn history
instructionsstringNatural language evaluation criteria
modelobjectautoOverride the eval model (see below)
maxRecentTurnsnumber5How many recent turns to include as context

Custom eval model

By default, EvalAgent uses Claude Haiku (if ANTHROPIC_API_KEY is set) or Kimi-K2.5 via OpenRouter. Override with:
curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "...",
  "model": {
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001"
  }
}'
Or use OpenRouter for any model:
curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "...",
  "model": {
    "provider": "openrouter",
    "model": "google/gemini-2.5-flash"
  }
}'

Disable eval

curl -X PUT .../config/eval -d '{"enabled": false}'

Architecture

EvalAgent is a separate Durable Object that runs independently from the document’s completion handler.
  • No latency impact — eval events are written to the document’s internal queue during completion, then processed asynchronously in a separate DO wake.
  • Durable — queued eval events survive DO hibernation. If the eval LLM is slow, events retry with exponential backoff.
  • Scoped — each document (or user, if scope: "user") gets its own EvalAgent instance with its own turn history.
  • Fail-open — if the eval LLM errors or times out, the completion is unaffected. Errors are logged, never surfaced to the user.

See also

  • Chat — document chat completions
  • Output Schema — structured extraction with validation