URL Builder - OkraPDF

Overview

The doc() URL builder generates immutable, CDN-cacheable URLs for every document property. URLs are constructed at build time with zero API calls — use them directly in <img>, <a>, or any HTML attribute. Think Cloudinary for images, but for documents.

Quick start

import { doc } from '@okrapdf/runtime';

const d = doc('ocr-7fK3x');

d.url()                          // full document
d.pg[1].png()                    // page 1 as PNG
d.pg[1].md()                     // page 1 as markdown
d.pg[1].json()                   // page 1 structured JSON
d.pages[1].image.url()           // page 1 image (legacy)
d.pages[1].markdown.url()        // page 1 markdown (legacy)
d.entities.tables.url()          // all tables
d.entities.tables[0].url()       // first table (JSON)
d.entities.tables[0].url({ format: 'csv' })   // first table (CSV)
d.entities.tables[0].url({ format: 'html' })  // first table (HTML)
d.entities.figures.url()         // all figures
d.url({ format: 'json', include: ['tables', 'text'] })  // filtered export

doc() accepts canonical IDs (ocr-*) and short public aliases (d_*) when available.

Supported formats

Every page is a URL. The extension declares the format.

Format	URL	Status
PNG	`/v1/documents/:id/pages/1/image.png`	Supported
Markdown	`/v1/documents/:id/pages/1/document_a3f8b2.md`	Supported
JSON (structured)	`/v1/documents/:id/pages/1/document_a3f8b2.json`	Supported

Not yet supported

These formats follow the same URL grammar but are not served by the API today. The URL builder may generate them, but the server will return 404.

Format	URL	Notes
JPEG	`/pages/1/image.jpg`	No transcoding yet — we store PNGs only
WebP	`/pages/1/image.webp`	Same — requires server-side format conversion
XLSX	`/entities/tables/0.xlsx`	Tables export as CSV/JSON/HTML today
PDF (per-page)	`/pages/1/document.pdf`	Full PDF download exists, per-page slice does not
SVG	`/pages/1/image.svg`	Would require vector rendering pipeline

Format negotiation (serve WebP when browser supports it) is planned but not implemented. For now, always request .png for images and .md for text.

All generated URLs

For a document ocr-7fK3x, the builder generates:

Property	URL
Document	`/v1/documents/ocr-7fK3x/document_a3f8b2.json`
Page 1 image	`/v1/documents/ocr-7fK3x/pages/1/image.png`
Page 1 markdown	`/v1/documents/ocr-7fK3x/pages/1/document_a3f8b2.md`
Page 1 (flat)	`/v1/documents/ocr-7fK3x/pg_1.png`
All Tables	`/v1/documents/ocr-7fK3x/entities/tables/document_a3f8b2.json`
Table 0 (JSON)	`/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.json`
Table 0 (CSV)	`/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.csv?format=csv`
All Figures	`/v1/documents/ocr-7fK3x/entities/figures/document_a3f8b2.json`

There is no dedicated /thumbnail endpoint. A thumbnail is just page 1: d.pg[1].png().

Use in React / JSX

import { doc } from '@okrapdf/runtime';

function DocumentCard({ docId }: { docId: string }) {
  const d = doc(docId);
  return (
    <article>
      <img src={d.pg[1].png()} alt="Document cover" />
      <h2>Report</h2>
      <ul>
        <li><a href={d.pages[1].image.url()}>View page 1</a></li>
        <li><a href={d.entities.tables[0].url({ format: 'csv' })}>Download table (CSV)</a></li>
        <li><a href={d.url({ format: 'json' })}>Full JSON export</a></li>
      </ul>
    </article>
  );
}

Use as og:image

import { doc } from '@okrapdf/runtime';

export function generateMetadata({ params }) {
  const d = doc(params.docId);
  return {
    openGraph: {
      images: [d.pg[1].png()],
    },
  };
}

PDF lead magnet landing page

Upload a whitepaper, get a landing page with zero backend work:

const session = await okra.sessions.create('whitepaper.pdf', { wait: true });
await session.publish();

const d = doc(session.id);

// Everything you need for a lead magnet page:
const ogImage = d.pg[1].png();                  // social preview
const heroPreview = d.pages[1].image.url();      // hero section
const tableDownload = d.entities.tables[0].url({ format: 'csv' }); // CTA

Provider transformations

Like Cloudinary’s /w_300/image.jpg, OkraPDF uses /t_{provider}/ to select the extraction backend. Same document, different output:

import { doc } from '@okrapdf/runtime';

// Default extraction
const d = doc('ocr-7fK3x');
d.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/pages/1/markdown

// LlamaParse extraction
const llama = doc('ocr-7fK3x', { provider: 'llamaparse' });
llama.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/t_llamaparse/pages/1/markdown

// Compare side-by-side
doc('ocr-7fK3x', { provider: 'googleocr' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'docling' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'unstructured' }).pages[1].url()

The provider segment is router-only — stripped before forwarding to the document. It tells the system which extraction backend to read from.

Default image placeholders

When a page image isn’t rendered yet, the API returns 404. Add a /d_{type}/ segment to serve an inline SVG placeholder instead — the browser shows it immediately and retries on next load (5s cache).

// Shimmer placeholder (animated loading effect)
d.pages[0].image.url()
// with d_ → /v1/documents/ocr-7fK3x/d_shimmer/pages/0/image

// Solid color placeholder
// → /v1/documents/ocr-7fK3x/d_color:e2e8f0/pages/0/image

Type	Description
`d_shimmer`	Animated gradient shimmer (like skeleton loading)
`d_color:{hex}`	Solid color rectangle, e.g. `d_color:3b82f6` for blue

Behavior:

R2 hit (image exists) → real PNG, Cache-Control: immutable
R2 miss (not ready) → SVG placeholder, Cache-Control: max-age=5, X-Okra-Placeholder: true header
No d_ segment → 404 on miss (current behavior)

Works with all URL features — combine with t_ providers, artifact slugs, etc:

/v1/documents/ocr-7fK3x/t_llamaparse/d_shimmer/pages/1/image/report_a3f8b2.png

Use in <img> tags directly — shimmer renders while the real image processes:

<img src="https://api.okrapdf.com/v1/documents/ocr-7fK3x/d_shimmer/pages/0/image/doc.png" alt="Page 1" />

Artifact slugs

When fileName is provided, URLs get cache-friendly, human-readable suffixes:

const d = doc('ocr-7fK3x', { fileName: 'quarterly-report.pdf' });

d.url()
// → /v1/documents/ocr-7fK3x/quarterly-report_a3f8b2.json

d.pages[1].image.url()
// → /v1/documents/ocr-7fK3x/pages/1/image/quarterly-report_a3f8b2.png

d.entities.tables[0].url({ format: 'csv' })
// → /v1/documents/ocr-7fK3x/entities/tables/0/quarterly-report_a3f8b2.csv?format=csv

The suffix is a 6-character FNV-1a hash of the document ID. You can also provide a custom suffix:

const d = doc('ocr-7fK3x', { fileName: 'report.pdf', suffix: 'xyz789' });

Artifact slugs are cosmetic — the router strips them before forwarding. They make URLs mime-explicit and shareable.

Browser usage

Use the URL builder in browser code without Node.js:

<script type="module">
  import { doc } from 'https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js';

  const d = doc('ocr-7fK3x');
  document.querySelector('#cover').src = d.pg[1].png();
  document.querySelector('#csv-link').href = d.entities.tables[0].url({ format: 'csv' });
</script>

Or use the global export:

<script src="https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js"></script>
<script>
  const d = window.OkraRuntime.doc('ocr-7fK3x');
  console.log(d.pg[1].md());
</script>

The browser bundle exports only doc() — no Node.js dependencies. For data fetching in the browser, use the generated URLs with fetch() directly.

Custom base URL

For self-hosted deployments:

const d = doc('ocr-7fK3x', 'https://docs.mycompany.com');
d.pg[1].png(); // https://docs.mycompany.com/v1/documents/ocr-7fK3x/pg_1.png

Private vs public URLs

URLs always resolve to the same path. Access depends on whether the document is published:

Private (default): URLs return 404 without API key
Published: URLs are publicly accessible, CDN-cacheable

// Make URLs public
const session = okra.sessions.from(docId);
await session.publish();
// Now all doc(docId).*.url() paths return content without auth

App reader URL (unchanged)

If you’re linking to Okra’s hosted reader UI, keep using: https://app.okrapdf.com/ocr/{canonicalId}/reader

Documentation Index

​Overview

​Quick start

​Supported formats

​Not yet supported

​All generated URLs

​Use in React / JSX

​Use as og:image

​PDF lead magnet landing page

​Provider transformations

​Default image placeholders

​Artifact slugs

​Browser usage

​Custom base URL

​Private vs public URLs

​App reader URL (unchanged)

Overview

Quick start

Supported formats

Not yet supported

All generated URLs

Use in React / JSX

Use as og:image

PDF lead magnet landing page

Provider transformations

Default image placeholders

Artifact slugs

Browser usage

Custom base URL

Private vs public URLs

App reader URL (unchanged)