> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document DO Lifecycle HTTP API Design

> Document-per-DO HTTP shape with explicit runs and completion endpoints, grounded in current agent-session schema.

## TL;DR

Keep the public API document-first:

* `documentId` = durable DO identity
* `runId` = one processing execution (`workflowId` alias)

The "room" concept is only a lifecycle metaphor, not a public resource name.

## Current truth (from agent-session today)

What exists now:

1. `meta` table stores document-level state like `phase`, `active_workflow_id`.
2. `document_log` is append-only (`seq` autoincrement), used for workflow and audit events.
3. Upload responses already return `workflowId`.
4. There is **no first-class `runs` table** yet.
5. `/document/:id/status` is document-scoped (not run-scoped).

This design is additive and does not assume a runs table already exists.

## Lifecycle model

1. **Document**: durable workspace entity for one PDF.
2. **Run**: one extraction/reparse execution for that document.
3. **Event stream**: append-only document log (`document_log.seq`) for cursoring.

## Proposed HTTP API

### Document endpoints

1. `POST /v1/documents`
2. `GET /v1/documents/:documentId`
3. `GET /v1/documents/:documentId/events?after=<seq>&limit=<n>`
4. `POST /v1/documents/:documentId/share-links`
5. `POST /v1/documents/:documentId/publish`

### Run endpoints

1. `GET /v1/documents/:documentId/runs`
2. `GET /v1/documents/:documentId/runs/:runId`
3. `POST /v1/documents/:documentId/runs` (start upload/reparse run)
4. `POST /v1/documents/:documentId/runs/:runId/cancel`

### Completion endpoints

1. `POST /v1/documents/:documentId/responses`
2. `POST /v1/documents/:documentId/responses:stream`
3. `POST /v1/shares/:shareId/responses` (redaction/permission constrained)

### Existing route compatibility

Keep `/document/:id/*` as compatibility routes; internally map:

* `/document/:id/status` -> `/v1/documents/:id`
* `/document/:id/completion` -> `/v1/documents/:id/responses`
* `/document/:id/reparse` -> `POST /v1/documents/:id/runs`

## HTTP diff (current -> proposed)

```diff theme={null}
# Lifecycle
- POST /document/:id/upload
+ POST /v1/documents/:id/runs

- GET /document/:id/status
+ GET /v1/documents/:id

+ GET /v1/documents/:id/runs
+ GET /v1/documents/:id/runs/:runId
+ GET /v1/documents/:id/events?after=<seq>&limit=<n>

# Completion
- POST /document/:id/completion
+ POST /v1/documents/:id/responses
+ POST /v1/documents/:id/responses:stream
+ POST /v1/shares/:shareId/responses
```

## Is this a lot of change?

Not really. This is mostly additive:

1. Keep old `/document/:id/*` routes as aliases.
2. Add explicit `/runs` and `/events` resources.
3. Rename completion endpoint to `/responses` for standard agent/client conventions.
4. Return `runId` alongside `workflowId` during transition.

## Response shapes

### Document

```json theme={null}
{
  "documentId": "ocr_...",
  "phase": "complete",
  "activeRunId": "lifecycle-ocr_...-1700000000000",
  "updatedAt": 1700000000000
}
```

### Run

```json theme={null}
{
  "runId": "lifecycle-ocr_...-1700000000000",
  "documentId": "ocr_...",
  "phase": "complete",
  "startedAt": 1700000000000,
  "updatedAt": 1700000012345,
  "completedAt": 1700000012345,
  "error": null
}
```

Clients can derive "last successful run" from sorted runs; server does not need a special field.

### Response

```json theme={null}
{
  "documentId": "ocr_...",
  "runIdUsed": "lifecycle-ocr_...-1700000000000",
  "answer": "Total revenue is ...",
  "citations": [],
  "costUsd": 0.0012
}
```

## Data model plan

### Phase 1 (no schema migration)

Build `/runs` from:

1. `meta.active_workflow_id` (active run)
2. `document_log` workflow events (`workflow_complete`, `workflow_error`)
3. upload/reparse lifecycle response metadata where available

This is best-effort historical coverage.

### Phase 2 (recommended)

Add first-class `runs` table:

```sql theme={null}
CREATE TABLE runs (
  run_id TEXT PRIMARY KEY,
  document_id TEXT NOT NULL,
  phase TEXT NOT NULL,
  started_at INTEGER NOT NULL,
  updated_at INTEGER NOT NULL,
  completed_at INTEGER,
  error TEXT
);
CREATE INDEX idx_runs_document ON runs(document_id, started_at DESC);
```

On workflow lifecycle hooks (`start/progress/complete/error`), upsert `runs`.

## Cursoring model (Partykit-style lifecycle)

Use `document_log.seq` as cursor:

1. client stores `lastSeq`
2. requests `GET /events?after=<lastSeq>`
3. receives ordered events
4. updates cursor

This mirrors durable-entity event streaming patterns used in collaborative systems.

## Implementation task list

1. Add `/v1/documents/:documentId/runs` and `/v1/documents/:documentId/runs/:runId`.
2. Add `/v1/documents/:documentId/events` with `after` cursor over `document_log.seq`.
3. Add `/v1/documents/:documentId/responses` and `:stream`.
4. Add compatibility mappings from existing `/document` routes.
5. Add contract tests: document fetch, runs list, events cursor, completion alias parity.
6. Add schema migration for `runs` table (phase 2).
7. Update SDK docs/examples to prefer document-first terminology.
