> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingest API

> Send pre-parsed vendor results and let OkraPDF hydrate, index, and serve the document lifecycle.

## Overview

Use `POST /v1/documents/ingest` when parsing already happened in your own pipeline.

You send vendor output (`unstructured`, `llamaparse`, or `canonical`) and OkraPDF handles normalization, hydration, lifecycle processing, and document endpoints.

## Request

```bash theme={null}
curl -X POST https://api.okrapdf.com/v1/documents/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vendor": "unstructured",
    "data": [
      {
        "type": "NarrativeText",
        "text": "Invoice total due is $12,480",
        "metadata": { "page_number": 1 }
      }
    ],
    "pdfUrl": "https://example.com/invoice.pdf"
  }'
```

## Supported connector IDs

| `vendor` value | Expected shape                                                  |
| -------------- | --------------------------------------------------------------- |
| `unstructured` | array of Unstructured elements (`type`, `metadata.page_number`) |
| `llamaparse`   | object with `pages[].items[]` entries                           |
| `canonical`    | object with canonical `pages[].blocks[]`                        |

If `vendor` is omitted, OkraPDF tries to auto-detect from payload shape.

## Response model

The endpoint returns `202 Accepted` and starts lifecycle processing.

```json theme={null}
{
  "documentId": "doc-...",
  "phase": "ingesting",
  "status": "processing",
  "vendor": "unstructured",
  "pageCount": 12,
  "workflowId": "...",
  "urls": {
    "self": "https://api.okrapdf.com/document/doc-...",
    "status": "https://api.okrapdf.com/document/doc-.../status",
    "pages": "https://api.okrapdf.com/document/doc-.../pages",
    "publish": "https://api.okrapdf.com/document/doc-.../publish"
  }
}
```

## What happens after ingest

1. Vendor payload is normalized to Okra's canonical parse shape.
2. Parsed nodes are hydrated into the document graph.
3. Lifecycle jobs run (snapshot/materialization/projection workflow).
4. Standard document surfaces become available (`pages`, chat/completion, output profiles, URL builder).

## Failure modes

* Unknown payload shape without `vendor`: `422` with supported connector list.
* Invalid payload for chosen connector: `422` normalization error.
* Workflow startup failure: `500` with error payload.

No silent drops: payloads are validated before lifecycle continues.

## Replace mode

Pass `"mode": "replace"` to supersede existing nodes on affected pages before hydrating new ones.
Existing nodes get `status = 'superseded'` — they remain in the graph for audit but are excluded
from completions.

```bash theme={null}
curl -X POST https://api.okrapdf.com/document/$DOC_ID/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vendor": "canonical",
    "mode": "replace",
    "data": { "pages": [{ "pageNumber": 3, "blocks": [...] }] }
  }'
```

Combine with [branching](/cookbook/branch-and-replace) to correct extraction errors
without touching the original document.

## Example: LlamaParse → Ingest → Chat

A complete walkthrough: parse a PDF with LlamaParse, ingest the result, and query it.

```bash theme={null}
# 1. Parse with LlamaParse
JOB=$(curl -s -X POST 'https://api.cloud.llamaindex.ai/api/parsing/upload' \
  -H "Authorization: Bearer $LLAMAPARSE_API_KEY" \
  -F 'file=@report.pdf' | jq -r '.id')

# 2. Wait for LlamaParse to finish
while [ "$(curl -s https://api.cloud.llamaindex.ai/api/parsing/job/$JOB \
  -H "Authorization: Bearer $LLAMAPARSE_API_KEY" | jq -r '.status')" != "SUCCESS" ]; do
  sleep 3
done

# 3. Fetch JSON result
curl -s "https://api.cloud.llamaindex.ai/api/parsing/job/$JOB/result/json" \
  -H "Authorization: Bearer $LLAMAPARSE_API_KEY" > result.json

# 4. Ingest into OkraPDF
DOC_ID=$(curl -s -X POST https://api.okrapdf.com/v1/documents/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --argjson data "$(cat result.json)" \
    '{vendor: "llamaparse", data: $data}')" \
  | jq -r '.documentId')

echo "Document: $DOC_ID"

# 5. Wait for lifecycle
while [ "$(curl -s https://api.okrapdf.com/document/$DOC_ID/status \
  -H "Authorization: Bearer $OKRA_API_KEY" | jq -r '.phase')" != "complete" ]; do
  sleep 2
done

# 6. Chat with the document
curl -s -X POST "https://api.okrapdf.com/document/$DOC_ID/chat/completions" \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Summarize this document"}]}' \
  | jq -r '.choices[0].message.content'
```

### What surfaces are available after ingest

| Surface                         | Available | Notes                                                 |
| ------------------------------- | --------- | ----------------------------------------------------- |
| Chat completions                | Yes       | Full document context from ingested nodes             |
| Structured output (`/generate`) | Yes       | Works on ingested nodes like any document             |
| Status                          | Yes       | Phase, page count, node count                         |
| Branch                          | Yes       | Zero-copy fork of ingested document                   |
| Page images (`pg_N.png`)        | No        | Requires original PDF binary (use `pdfUrl` to enable) |
| Download (`/download`)          | No        | Requires original PDF binary                          |
| Full markdown (`full.md`)       | Yes       | Materialized from R2 snapshot                         |

Pass `pdfUrl` in the ingest request to enable page images and downloads:

```bash theme={null}
curl -X POST https://api.okrapdf.com/v1/documents/ingest \
  -H "Authorization: Bearer $OKRA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "vendor": "llamaparse",
    "data": { ... },
    "pdfUrl": "https://example.com/report.pdf"
  }'
```

## When to use this endpoint

Use Ingest API when you:

* already run extraction with external vendors,
* want OkraPDF delivery + policy + output layers,
* need a stable `doc-...` lifecycle without re-running OCR in Okra.

## Related Docs

<CardGroup cols={2}>
  <Card title="Branch + Replace" icon="code-branch" href="/cookbook/branch-and-replace">
    Fork a doc, replace bad OCR, compare completions.
  </Card>

  <Card title="Output Schema" icon="table" href="/features/output-schema">
    Materialize reproducible structured outputs from ingested documents.
  </Card>

  <Card title="URL Builder" icon="link" href="/cookbook/url-builder">
    Build immutable URLs for pages, tables, and artifacts.
  </Card>
</CardGroup>
