> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Document + Run Model

> Durable document identity with explicit processing runs, terminology, and implementation task list.

## Why this exists

We treat a document as a durable, workspace-like entity.

Confusion appears when upload is interpreted as returning a long-lived "job object". In reality:

* `documentId` is the durable identity.
* `workflowId` is a processing run identity for that upload/reparse execution.

This page is the canonical model and follow-through plan.

For the durable-entity lifecycle framing and concrete endpoint design, see [Document DO Lifecycle HTTP API Design](/developers/room-http-api-design).

## Canonical terminology

1. **Document**: Permanent identity (`documentId`) and storage scope.
2. **Run**: One processing execution attached to a document (`workflowId` / future `runId`).
3. **Session**: SDK handle bound to one document (`okra.sessions.from/create`).

## Current contract (today)

`POST /document/:id/upload` returns document-scoped metadata:

* `documentId`
* `phase`
* `status`
* `workflowId` (current run)
* `urls`

`GET /document/:id/status` is document-scoped and reports current/latest state.

## Target contract (design)

Document remains primary. Runs become explicit sub-resources.

### Document endpoints

* `GET /document/:id/status`
* `POST /document/:id/completion`
* `POST /document/:id/share-link`
* `POST /document/:id/publish`

### Run endpoints (additive)

* `GET /document/:id/runs`
* `GET /document/:id/runs/:runId`
* `POST /document/:id/runs/:runId/cancel` (or `/cancel` on active run)

### Run record shape (minimal)

* `runId`
* `documentId`
* `phase`
* `startedAt`
* `updatedAt`
* `completedAt` (nullable)
* `error` (nullable)

No derived rollups are required in `/status`. Clients can derive "last successful run" from `/runs`.

## Change magnitude (current -> target)

This is a small-to-medium additive change:

1. Keep existing `/document/:id/*` routes working.
2. Add explicit `/runs` and `/events` resources for lifecycle visibility.
3. Introduce `/responses` naming for completion while keeping `/completion` alias during migration.
4. Keep `workflowId` while adding `runId` for naming convergence.

Endpoint diff details are documented in [Document DO Lifecycle HTTP API Design](/developers/room-http-api-design).

## Design rules

1. Never change `documentId` across reparse/retry.
2. Every upload/reparse creates a new run record.
3. Completion/share/publish are document-scoped by default.
4. Deterministic export should support run/version pinning.
5. Keep backward compatibility with additive fields first, then deprecate.

## Conflicts to erase

1. Avoid "job object" phrasing in SDK docs.
2. Standardize on "documentId" in examples.
3. Mark `documents[].jobId` in deploy payload as legacy naming alias.
4. Use "run/workflow" language for execution state.

## Implementation task list

### P0: Terminology + docs (immediate)

1. Update SDK docs to define document vs run explicitly.
2. Update cookbook upload examples to capture `workflowId`.
3. Mark multi-doc deploy `documents[].jobId` as legacy naming in docs.
4. Add this page to SDK navigation.

### P1: API shape hardening (additive)

1. Persist run ledger per document.
2. Expose `GET /document/:id/runs` and `GET /document/:id/runs/:runId`.
3. Keep `/status` document-scoped; do not add derived run rollups.
4. Include `runId`/`workflowId` consistently in lifecycle responses.

### P2: SDK improvements

1. Add typed run metadata on `session.status()`.
2. Add optional run-aware helpers for diagnostics (without making runs primary).
3. Keep session-first ergonomics unchanged for default usage.

### P3: Naming convergence

1. Accept both `documents[].jobId` and `documents[].documentId` on deploy.
2. Prefer `documentId` in responses/docs.
3. Mark `jobId` request field deprecated with sunset date.

### P4: Validation + rollout

1. Add contract tests for upload/status/run fields.
2. Add migration tests for old clients (no breakage).
3. Add telemetry dashboards: run success rate, retries, phase duration.
4. Publish migration note in changelog and SDK docs.

## Definition of done

1. Public docs no longer imply "job object" as primary identity.
2. API exposes first-class run history.
3. SDK remains small/session-first while still enabling run diagnostics.
4. Deploy payload accepts modern naming (`documentId`) and handles legacy alias.
