Documentation Index
Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Combine okra commands with shell scripting for batch workflows: upload a folder of PDFs, wait for processing, then export results.
Upload a directory of PDFs
for pdf in ./reports/*.pdf; do
okra jobs create "$pdf" --wait -o json >/dev/null
done
With --wait, each job completes before moving to the next. Drop --wait to submit all jobs in parallel and poll later.
Parallel upload, then wait
# Submit all jobs (returns immediately)
JOBS=()
for pdf in ./reports/*.pdf; do
JOB_ID=$(okra jobs create "$pdf" -o json | jq -r '.job_id')
JOBS+=("$JOB_ID")
echo "Submitted: $JOB_ID ($pdf)"
done
# Wait for all to complete
for id in "${JOBS[@]}"; do
okra jobs wait "$id" -o json >/dev/null
echo "Done: $id"
done
Export all completed jobs to markdown
okra jobs list --jq '[.[] | select(.status=="completed")] | .[].job_id' \
| tr -d '"[],' | tr ' ' '\n' \
| while read -r id; do
okra jobs results "$id" -o markdown > "results/${id}.md"
echo "Exported: $id"
done
for id in $(okra jobs list --jq '[.[] | select(.status=="completed")] | map(.job_id) | .[]' | tr -d '"'); do
okra elements export "$id" -t tables -f csv -d "tables/${id}" -o json >/dev/null && echo "Tables: $id"
done
Run analysis across all documents in a collection
# Get all job IDs from a collection
COLLECTION_ID="col-abc123"
JOBS=$(okra collections show "$COLLECTION_ID" -o json | jq -r '.documents | map(.job_id) | join(",")')
# Run multi-doc task
okra task "$JOBS" -m "summarize key financial metrics across all documents" --quiet > analysis.txt
Webhook-driven pipeline
For production workflows, use webhooks instead of polling:
# Submit with webhook callback
okra jobs create report.pdf --webhook "https://your-server.com/api/ocr-complete"
# Your webhook receives:
# POST /api/ocr-complete
# { "job_id": "ocr-xxx", "status": "completed", "total_pages": 42 }
CI/CD integration
# GitHub Actions example
- name: Extract PDF tables
run: |
JOB_JSON=$(okra jobs create ./quarterly-report.pdf --wait -o json)
JOB_ID=$(echo "$JOB_JSON" | jq -r '.job_id')
okra jobs results "$JOB_ID" -o json > results.json
- name: Run multi-doc analysis
run: |
JOBS=$(okra jobs list --jq '[.[] | select(.status=="completed")] | .[0:10] | map(.job_id) | join(",")' | tr -d '"')
okra task "$JOBS" --quiet -m "identify compliance issues" > compliance-report.txt
Rate limiting
The API enforces per-user rate limits. For large batches, add a small delay:
for pdf in ./reports/*.pdf; do
okra jobs create "$pdf" -o json >/dev/null &
sleep 1 # 1 second between submissions
done
wait # Wait for all background jobs