Overview
UsePOST /v1/documents/ingest when parsing already happened in your own pipeline.
You send vendor output (unstructured, llamaparse, or canonical) and OkraPDF handles normalization, hydration, lifecycle processing, and document endpoints.
Request
Supported connector IDs
vendor value | Expected shape |
|---|---|
unstructured | array of Unstructured elements (type, metadata.page_number) |
llamaparse | object with pages[].items[] entries |
canonical | object with canonical pages[].blocks[] |
vendor is omitted, OkraPDF tries to auto-detect from payload shape.
Response model
The endpoint returns202 Accepted and starts lifecycle processing.
What happens after ingest
- Vendor payload is normalized to Okra’s canonical parse shape.
- Parsed nodes are hydrated into the document graph.
- Lifecycle jobs run (snapshot/materialization/projection workflow).
- Standard document surfaces become available (
pages, chat/completion, output profiles, URL builder).
Failure modes
- Unknown payload shape without
vendor:422with supported connector list. - Invalid payload for chosen connector:
422normalization error. - Workflow startup failure:
500with error payload.
Replace mode
Pass"mode": "replace" to supersede existing nodes on affected pages before hydrating new ones.
Existing nodes get status = 'superseded' — they remain in the graph for audit but are excluded
from completions.
Example: LlamaParse → Ingest → Chat
A complete walkthrough: parse a PDF with LlamaParse, ingest the result, and query it.What surfaces are available after ingest
| Surface | Available | Notes |
|---|---|---|
| Chat completions | Yes | Full document context from ingested nodes |
Structured output (/generate) | Yes | Works on ingested nodes like any document |
| Status | Yes | Phase, page count, node count |
| Branch | Yes | Zero-copy fork of ingested document |
Page images (pg_N.png) | No | Requires original PDF binary (use pdfUrl to enable) |
Download (/download) | No | Requires original PDF binary |
Full markdown (full.md) | Yes | Materialized from R2 snapshot |
pdfUrl in the ingest request to enable page images and downloads:
When to use this endpoint
Use Ingest API when you:- already run extraction with external vendors,
- want OkraPDF delivery + policy + output layers,
- need a stable
doc-...lifecycle without re-running OCR in Okra.
Related Docs
Branch + Replace
Fork a doc, replace bad OCR, compare completions.
Output Schema
Materialize reproducible structured outputs from ingested documents.
URL Builder
Build immutable URLs for pages, tables, and artifacts.