Eval Agent - OkraPDF

Overview

Every OkraPDF document can have an Eval Agent attached. When enabled, it evaluates each chat completion asynchronously — checking for hallucinated facts, compliance violations, or custom policy rules you define. The eval never blocks the response. Results appear in the document event log within seconds.

User question → Completion runs → Response sent immediately
                                        │
                                        ▼ (async, via queue)
                                   EvalAgent evaluates
                                        │
                                        ▼
                                   Alerts logged

Enable eval on a document

curl -X PUT https://api.okrapdf.com/document/$DOC_ID/config/eval \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "enabled": true,
    "instructions": "Flag any response that cites numbers, dates, or facts not found in the document."
  }'

Response:

{
  "document_id": "doc-abc123",
  "spec_version": 1,
  "eval": {
    "enabled": true,
    "scope": "document",
    "instructions": "Flag any response that cites numbers, dates, or facts not found in the document.",
    "maxRecentTurns": 5
  }
}

How it works

You chat normally — POST /document/:id/chat/completions returns instantly, no latency added.
Three hooks fire asynchronously via the document’s internal queue:
- turn.before — evaluates the user query before the LLM runs
- tool.execute.after — evaluates tool call results
- turn.after — evaluates the final response against the document
EvalAgent judges using a fast LLM (Haiku or Kimi-K2.5) with your instructions as the evaluation criteria.
Results logged to the document event log — viewable via API or the info page.

Check eval results

curl https://api.okrapdf.com/document/$DOC_ID/events?limit=10 \
  -H "Authorization: Bearer $OKRA_API_KEY"

Example entries:

[
  {
    "event": "log",
    "detail": {
      "message": "[EvalAgent] turn.after completed: 2 action(s)"
    }
  },
  {
    "event": "log",
    "detail": {
      "message": "[EvalAgent] info: Response correctly reports that revenue data is not in the document. No hallucinated figures detected."
    }
  }
]

Guardrail examples

Hallucination detection (financial documents)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag any response that cites dollar amounts, percentages, or financial metrics not explicitly stated in the document. Be strict — estimates or inferred values must be flagged."
}'

Source accuracy (legal/compliance)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Verify that every claim in the response has a direct source in the document. Flag any response that paraphrases in a way that changes the meaning. Flag missing citations or page references."
}'

PII leakage prevention

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag if the response contains Social Security numbers, account numbers, or personal addresses from the document. These should be redacted, not exposed in chat responses."
}'

Scope enforcement (narrow the agent)

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "This document is an employee handbook. Flag any response that answers questions outside the scope of HR policies, benefits, and workplace procedures. The agent should decline off-topic questions."
}'

Tone and brand voice

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "Flag responses that use casual language, slang, or first-person voice. All responses should be professional and written in third-person."
}'

Configuration options

Field	Type	Default	Description
`enabled`	boolean	`false`	Turn eval on/off
`scope`	`"document"` \| `"user"`	`"document"`	Eval context scope — per-document or per-user turn history
`instructions`	string	—	Natural language evaluation criteria
`model`	object	auto	Override the eval model (see below)
`maxRecentTurns`	number	`5`	How many recent turns to include as context

Custom eval model

By default, EvalAgent uses Claude Haiku (if ANTHROPIC_API_KEY is set) or Kimi-K2.5 via OpenRouter. Override with:

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "...",
  "model": {
    "provider": "anthropic",
    "model": "claude-haiku-4-5-20251001"
  }
}'

Or use OpenRouter for any model:

curl -X PUT .../config/eval -d '{
  "enabled": true,
  "instructions": "...",
  "model": {
    "provider": "openrouter",
    "model": "google/gemini-2.5-flash"
  }
}'

Disable eval

curl -X PUT .../config/eval -d '{"enabled": false}'

Architecture

EvalAgent is a separate Durable Object that runs independently from the document’s completion handler.

No latency impact — eval events are written to the document’s internal queue during completion, then processed asynchronously in a separate DO wake.
Durable — queued eval events survive DO hibernation. If the eval LLM is slow, events retry with exponential backoff.
Scoped — each document (or user, if scope: "user") gets its own EvalAgent instance with its own turn history.
Fail-open — if the eval LLM errors or times out, the completion is unaffected. Errors are logged, never surfaced to the user.

Documentation Index

​Overview

​Enable eval on a document

​How it works

​Check eval results

​Guardrail examples

​Hallucination detection (financial documents)

​Source accuracy (legal/compliance)

​PII leakage prevention

​Scope enforcement (narrow the agent)

​Tone and brand voice

​Configuration options

​Custom eval model

​Disable eval

​Architecture

​See also