> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# URL Builder

> Cloudinary-style URLs for every document property — zero API calls

## Overview

The `doc()` URL builder generates immutable, CDN-cacheable URLs for every document property. URLs are constructed at **build time** with zero API calls — use them directly in `<img>`, `<a>`, or any HTML attribute.

Think Cloudinary for images, but for documents.

## Quick start

```typescript theme={null}
import { doc } from '@okrapdf/runtime';

const d = doc('ocr-7fK3x');

d.url()                          // full document
d.pg[1].png()                    // page 1 as PNG
d.pg[1].md()                     // page 1 as markdown
d.pg[1].json()                   // page 1 structured JSON
d.pages[1].image.url()           // page 1 image (legacy)
d.pages[1].markdown.url()        // page 1 markdown (legacy)
d.entities.tables.url()          // all tables
d.entities.tables[0].url()       // first table (JSON)
d.entities.tables[0].url({ format: 'csv' })   // first table (CSV)
d.entities.tables[0].url({ format: 'html' })  // first table (HTML)
d.entities.figures.url()         // all figures
d.url({ format: 'json', include: ['tables', 'text'] })  // filtered export
```

`doc()` accepts canonical IDs (`ocr-*`) and short public aliases (`d_*`) when available.

## Supported formats

Every page is a URL. The extension declares the format.

| Format                | URL                                              | Status    |
| --------------------- | ------------------------------------------------ | --------- |
| **PNG**               | `/v1/documents/:id/pages/1/image.png`            | Supported |
| **Markdown**          | `/v1/documents/:id/pages/1/document_a3f8b2.md`   | Supported |
| **JSON** (structured) | `/v1/documents/:id/pages/1/document_a3f8b2.json` | Supported |

### Not yet supported

These formats follow the same URL grammar but are **not served by the API today**. The URL builder may generate them, but the server will return 404.

| Format             | URL                       | Notes                                             |
| ------------------ | ------------------------- | ------------------------------------------------- |
| **JPEG**           | `/pages/1/image.jpg`      | No transcoding yet — we store PNGs only           |
| **WebP**           | `/pages/1/image.webp`     | Same — requires server-side format conversion     |
| **XLSX**           | `/entities/tables/0.xlsx` | Tables export as CSV/JSON/HTML today              |
| **PDF** (per-page) | `/pages/1/document.pdf`   | Full PDF download exists, per-page slice does not |
| **SVG**            | `/pages/1/image.svg`      | Would require vector rendering pipeline           |

<Note>
  Format negotiation (serve WebP when browser supports it) is planned but not implemented. For now, always request `.png` for images and `.md` for text.
</Note>

## All generated URLs

For a document `ocr-7fK3x`, the builder generates:

| Property        | URL                                                                        |
| --------------- | -------------------------------------------------------------------------- |
| Document        | `/v1/documents/ocr-7fK3x/document_a3f8b2.json`                             |
| Page 1 image    | `/v1/documents/ocr-7fK3x/pages/1/image.png`                                |
| Page 1 markdown | `/v1/documents/ocr-7fK3x/pages/1/document_a3f8b2.md`                       |
| Page 1 (flat)   | `/v1/documents/ocr-7fK3x/pg_1.png`                                         |
| All Tables      | `/v1/documents/ocr-7fK3x/entities/tables/document_a3f8b2.json`             |
| Table 0 (JSON)  | `/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.json`           |
| Table 0 (CSV)   | `/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.csv?format=csv` |
| All Figures     | `/v1/documents/ocr-7fK3x/entities/figures/document_a3f8b2.json`            |

<Tip>
  There is no dedicated `/thumbnail` endpoint. A thumbnail is just page 1: `d.pg[1].png()`.
</Tip>

## Use in React / JSX

```tsx theme={null}
import { doc } from '@okrapdf/runtime';

function DocumentCard({ docId }: { docId: string }) {
  const d = doc(docId);
  return (
    <article>
      <img src={d.pg[1].png()} alt="Document cover" />
      <h2>Report</h2>
      <ul>
        <li><a href={d.pages[1].image.url()}>View page 1</a></li>
        <li><a href={d.entities.tables[0].url({ format: 'csv' })}>Download table (CSV)</a></li>
        <li><a href={d.url({ format: 'json' })}>Full JSON export</a></li>
      </ul>
    </article>
  );
}
```

## Use as og:image

```tsx theme={null}
import { doc } from '@okrapdf/runtime';

export function generateMetadata({ params }) {
  const d = doc(params.docId);
  return {
    openGraph: {
      images: [d.pg[1].png()],
    },
  };
}
```

## PDF lead magnet landing page

Upload a whitepaper, get a landing page with zero backend work:

```typescript theme={null}
const session = await okra.sessions.create('whitepaper.pdf', { wait: true });
await session.publish();

const d = doc(session.id);

// Everything you need for a lead magnet page:
const ogImage = d.pg[1].png();                  // social preview
const heroPreview = d.pages[1].image.url();      // hero section
const tableDownload = d.entities.tables[0].url({ format: 'csv' }); // CTA
```

## Provider transformations

Like Cloudinary's `/w_300/image.jpg`, OkraPDF uses `/t_{provider}/` to select the extraction backend. Same document, different output:

```typescript theme={null}
import { doc } from '@okrapdf/runtime';

// Default extraction
const d = doc('ocr-7fK3x');
d.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/pages/1/markdown

// LlamaParse extraction
const llama = doc('ocr-7fK3x', { provider: 'llamaparse' });
llama.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/t_llamaparse/pages/1/markdown

// Compare side-by-side
doc('ocr-7fK3x', { provider: 'googleocr' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'docling' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'unstructured' }).pages[1].url()
```

The provider segment is **router-only** — stripped before forwarding to the document. It tells the system which extraction backend to read from.

## Default image placeholders

When a page image isn't rendered yet, the API returns 404. Add a `/d_{type}/` segment to serve an inline SVG placeholder instead — the browser shows it immediately and retries on next load (5s cache).

```typescript theme={null}
// Shimmer placeholder (animated loading effect)
d.pages[0].image.url()
// with d_ → /v1/documents/ocr-7fK3x/d_shimmer/pages/0/image

// Solid color placeholder
// → /v1/documents/ocr-7fK3x/d_color:e2e8f0/pages/0/image
```

| Type            | Description                                           |
| --------------- | ----------------------------------------------------- |
| `d_shimmer`     | Animated gradient shimmer (like skeleton loading)     |
| `d_color:{hex}` | Solid color rectangle, e.g. `d_color:3b82f6` for blue |

**Behavior:**

* R2 hit (image exists) → real PNG, `Cache-Control: immutable`
* R2 miss (not ready) → SVG placeholder, `Cache-Control: max-age=5`, `X-Okra-Placeholder: true` header
* No `d_` segment → 404 on miss (current behavior)

Works with all URL features — combine with `t_` providers, artifact slugs, etc:

```
/v1/documents/ocr-7fK3x/t_llamaparse/d_shimmer/pages/1/image/report_a3f8b2.png
```

Use in `<img>` tags directly — shimmer renders while the real image processes:

```html theme={null}
<img src="https://api.okrapdf.com/v1/documents/ocr-7fK3x/d_shimmer/pages/0/image/doc.png" alt="Page 1" />
```

## Artifact slugs

When `fileName` is provided, URLs get cache-friendly, human-readable suffixes:

```typescript theme={null}
const d = doc('ocr-7fK3x', { fileName: 'quarterly-report.pdf' });

d.url()
// → /v1/documents/ocr-7fK3x/quarterly-report_a3f8b2.json

d.pages[1].image.url()
// → /v1/documents/ocr-7fK3x/pages/1/image/quarterly-report_a3f8b2.png

d.entities.tables[0].url({ format: 'csv' })
// → /v1/documents/ocr-7fK3x/entities/tables/0/quarterly-report_a3f8b2.csv?format=csv
```

The suffix is a 6-character FNV-1a hash of the document ID. You can also provide a custom suffix:

```typescript theme={null}
const d = doc('ocr-7fK3x', { fileName: 'report.pdf', suffix: 'xyz789' });
```

Artifact slugs are cosmetic — the router strips them before forwarding. They make URLs mime-explicit and shareable.

## Browser usage

Use the URL builder in browser code without Node.js:

```html theme={null}
<script type="module">
  import { doc } from 'https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js';

  const d = doc('ocr-7fK3x');
  document.querySelector('#cover').src = d.pg[1].png();
  document.querySelector('#csv-link').href = d.entities.tables[0].url({ format: 'csv' });
</script>
```

Or use the global export:

```html theme={null}
<script src="https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js"></script>
<script>
  const d = window.OkraRuntime.doc('ocr-7fK3x');
  console.log(d.pg[1].md());
</script>
```

The browser bundle exports only `doc()` — no Node.js dependencies. For data fetching in the browser, use the generated URLs with `fetch()` directly.

## Custom base URL

For self-hosted deployments:

```typescript theme={null}
const d = doc('ocr-7fK3x', 'https://docs.mycompany.com');
d.pg[1].png(); // https://docs.mycompany.com/v1/documents/ocr-7fK3x/pg_1.png
```

## Private vs public URLs

URLs always resolve to the same path. Access depends on whether the document is published:

* **Private (default)**: URLs return 404 without API key
* **Published**: URLs are publicly accessible, CDN-cacheable

```typescript theme={null}
// Make URLs public
const session = okra.sessions.from(docId);
await session.publish();
// Now all doc(docId).*.url() paths return content without auth
```

### App reader URL (unchanged)

If you're linking to Okra's hosted reader UI, keep using:

`https://app.okrapdf.com/ocr/{canonicalId}/reader`
