Skip to main content

Overview

The doc() URL builder generates immutable, CDN-cacheable URLs for every document property. URLs are constructed at build time with zero API calls — use them directly in <img>, <a>, or any HTML attribute. Think Cloudinary for images, but for documents.

Quick start

import { doc } from '@okrapdf/runtime';

const d = doc('ocr-7fK3x');

d.url()                          // full document
d.pg[1].png()                    // page 1 as PNG
d.pg[1].md()                     // page 1 as markdown
d.pg[1].json()                   // page 1 structured JSON
d.pages[1].image.url()           // page 1 image (legacy)
d.pages[1].markdown.url()        // page 1 markdown (legacy)
d.entities.tables.url()          // all tables
d.entities.tables[0].url()       // first table (JSON)
d.entities.tables[0].url({ format: 'csv' })   // first table (CSV)
d.entities.tables[0].url({ format: 'html' })  // first table (HTML)
d.entities.figures.url()         // all figures
d.url({ format: 'json', include: ['tables', 'text'] })  // filtered export
doc() accepts canonical IDs (ocr-*) and short public aliases (d_*) when available.

Supported formats

Every page is a URL. The extension declares the format.
FormatURLStatus
PNG/v1/documents/:id/pages/1/image.pngSupported
Markdown/v1/documents/:id/pages/1/document_a3f8b2.mdSupported
JSON (structured)/v1/documents/:id/pages/1/document_a3f8b2.jsonSupported

Not yet supported

These formats follow the same URL grammar but are not served by the API today. The URL builder may generate them, but the server will return 404.
FormatURLNotes
JPEG/pages/1/image.jpgNo transcoding yet — we store PNGs only
WebP/pages/1/image.webpSame — requires server-side format conversion
XLSX/entities/tables/0.xlsxTables export as CSV/JSON/HTML today
PDF (per-page)/pages/1/document.pdfFull PDF download exists, per-page slice does not
SVG/pages/1/image.svgWould require vector rendering pipeline
Format negotiation (serve WebP when browser supports it) is planned but not implemented. For now, always request .png for images and .md for text.

All generated URLs

For a document ocr-7fK3x, the builder generates:
PropertyURL
Document/v1/documents/ocr-7fK3x/document_a3f8b2.json
Page 1 image/v1/documents/ocr-7fK3x/pages/1/image.png
Page 1 markdown/v1/documents/ocr-7fK3x/pages/1/document_a3f8b2.md
Page 1 (flat)/v1/documents/ocr-7fK3x/pg_1.png
All Tables/v1/documents/ocr-7fK3x/entities/tables/document_a3f8b2.json
Table 0 (JSON)/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.json
Table 0 (CSV)/v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.csv?format=csv
All Figures/v1/documents/ocr-7fK3x/entities/figures/document_a3f8b2.json
There is no dedicated /thumbnail endpoint. A thumbnail is just page 1: d.pg[1].png().

Use in React / JSX

import { doc } from '@okrapdf/runtime';

function DocumentCard({ docId }: { docId: string }) {
  const d = doc(docId);
  return (
    <article>
      <img src={d.pg[1].png()} alt="Document cover" />
      <h2>Report</h2>
      <ul>
        <li><a href={d.pages[1].image.url()}>View page 1</a></li>
        <li><a href={d.entities.tables[0].url({ format: 'csv' })}>Download table (CSV)</a></li>
        <li><a href={d.url({ format: 'json' })}>Full JSON export</a></li>
      </ul>
    </article>
  );
}

Use as og:image

import { doc } from '@okrapdf/runtime';

export function generateMetadata({ params }) {
  const d = doc(params.docId);
  return {
    openGraph: {
      images: [d.pg[1].png()],
    },
  };
}

PDF lead magnet landing page

Upload a whitepaper, get a landing page with zero backend work:
const session = await okra.sessions.create('whitepaper.pdf', { wait: true });
await session.publish();

const d = doc(session.id);

// Everything you need for a lead magnet page:
const ogImage = d.pg[1].png();                  // social preview
const heroPreview = d.pages[1].image.url();      // hero section
const tableDownload = d.entities.tables[0].url({ format: 'csv' }); // CTA

Provider transformations

Like Cloudinary’s /w_300/image.jpg, OkraPDF uses /t_{provider}/ to select the extraction backend. Same document, different output:
import { doc } from '@okrapdf/runtime';

// Default extraction
const d = doc('ocr-7fK3x');
d.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/pages/1/markdown

// LlamaParse extraction
const llama = doc('ocr-7fK3x', { provider: 'llamaparse' });
llama.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/t_llamaparse/pages/1/markdown

// Compare side-by-side
doc('ocr-7fK3x', { provider: 'googleocr' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'docling' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'unstructured' }).pages[1].url()
The provider segment is router-only — stripped before forwarding to the document. It tells the system which extraction backend to read from.

Default image placeholders

When a page image isn’t rendered yet, the API returns 404. Add a /d_{type}/ segment to serve an inline SVG placeholder instead — the browser shows it immediately and retries on next load (5s cache).
// Shimmer placeholder (animated loading effect)
d.pages[0].image.url()
// with d_ → /v1/documents/ocr-7fK3x/d_shimmer/pages/0/image

// Solid color placeholder
// → /v1/documents/ocr-7fK3x/d_color:e2e8f0/pages/0/image
TypeDescription
d_shimmerAnimated gradient shimmer (like skeleton loading)
d_color:{hex}Solid color rectangle, e.g. d_color:3b82f6 for blue
Behavior:
  • R2 hit (image exists) → real PNG, Cache-Control: immutable
  • R2 miss (not ready) → SVG placeholder, Cache-Control: max-age=5, X-Okra-Placeholder: true header
  • No d_ segment → 404 on miss (current behavior)
Works with all URL features — combine with t_ providers, artifact slugs, etc:
/v1/documents/ocr-7fK3x/t_llamaparse/d_shimmer/pages/1/image/report_a3f8b2.png
Use in <img> tags directly — shimmer renders while the real image processes:
<img src="https://api.okrapdf.com/v1/documents/ocr-7fK3x/d_shimmer/pages/0/image/doc.png" alt="Page 1" />

Artifact slugs

When fileName is provided, URLs get cache-friendly, human-readable suffixes:
const d = doc('ocr-7fK3x', { fileName: 'quarterly-report.pdf' });

d.url()
// → /v1/documents/ocr-7fK3x/quarterly-report_a3f8b2.json

d.pages[1].image.url()
// → /v1/documents/ocr-7fK3x/pages/1/image/quarterly-report_a3f8b2.png

d.entities.tables[0].url({ format: 'csv' })
// → /v1/documents/ocr-7fK3x/entities/tables/0/quarterly-report_a3f8b2.csv?format=csv
The suffix is a 6-character FNV-1a hash of the document ID. You can also provide a custom suffix:
const d = doc('ocr-7fK3x', { fileName: 'report.pdf', suffix: 'xyz789' });
Artifact slugs are cosmetic — the router strips them before forwarding. They make URLs mime-explicit and shareable.

Browser usage

Use the URL builder in browser code without Node.js:
<script type="module">
  import { doc } from 'https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js';

  const d = doc('ocr-7fK3x');
  document.querySelector('#cover').src = d.pg[1].png();
  document.querySelector('#csv-link').href = d.entities.tables[0].url({ format: 'csv' });
</script>
Or use the global export:
<script src="https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js"></script>
<script>
  const d = window.OkraRuntime.doc('ocr-7fK3x');
  console.log(d.pg[1].md());
</script>
The browser bundle exports only doc() — no Node.js dependencies. For data fetching in the browser, use the generated URLs with fetch() directly.

Custom base URL

For self-hosted deployments:
const d = doc('ocr-7fK3x', 'https://docs.mycompany.com');
d.pg[1].png(); // https://docs.mycompany.com/v1/documents/ocr-7fK3x/pg_1.png

Private vs public URLs

URLs always resolve to the same path. Access depends on whether the document is published:
  • Private (default): URLs return 404 without API key
  • Published: URLs are publicly accessible, CDN-cacheable
// Make URLs public
const session = okra.sessions.from(docId);
await session.publish();
// Now all doc(docId).*.url() paths return content without auth

App reader URL (unchanged)

If you’re linking to Okra’s hosted reader UI, keep using: https://app.okrapdf.com/ocr/{canonicalId}/reader