Overview
The doc() URL builder generates immutable, CDN-cacheable URLs for every document property. URLs are constructed at build time with zero API calls — use them directly in <img>, <a>, or any HTML attribute.
Think Cloudinary for images, but for documents.
Quick start
import { doc } from '@okrapdf/runtime';
const d = doc('ocr-7fK3x');
d.url() // full document
d.pg[1].png() // page 1 as PNG
d.pg[1].md() // page 1 as markdown
d.pg[1].json() // page 1 structured JSON
d.pages[1].image.url() // page 1 image (legacy)
d.pages[1].markdown.url() // page 1 markdown (legacy)
d.entities.tables.url() // all tables
d.entities.tables[0].url() // first table (JSON)
d.entities.tables[0].url({ format: 'csv' }) // first table (CSV)
d.entities.tables[0].url({ format: 'html' }) // first table (HTML)
d.entities.figures.url() // all figures
d.url({ format: 'json', include: ['tables', 'text'] }) // filtered export
doc() accepts canonical IDs (ocr-*) and short public aliases (d_*) when available.
Every page is a URL. The extension declares the format.
| Format | URL | Status |
|---|
| PNG | /v1/documents/:id/pages/1/image.png | Supported |
| Markdown | /v1/documents/:id/pages/1/document_a3f8b2.md | Supported |
| JSON (structured) | /v1/documents/:id/pages/1/document_a3f8b2.json | Supported |
Not yet supported
These formats follow the same URL grammar but are not served by the API today. The URL builder may generate them, but the server will return 404.
| Format | URL | Notes |
|---|
| JPEG | /pages/1/image.jpg | No transcoding yet — we store PNGs only |
| WebP | /pages/1/image.webp | Same — requires server-side format conversion |
| XLSX | /entities/tables/0.xlsx | Tables export as CSV/JSON/HTML today |
| PDF (per-page) | /pages/1/document.pdf | Full PDF download exists, per-page slice does not |
| SVG | /pages/1/image.svg | Would require vector rendering pipeline |
Format negotiation (serve WebP when browser supports it) is planned but not implemented. For now, always request .png for images and .md for text.
All generated URLs
For a document ocr-7fK3x, the builder generates:
| Property | URL |
|---|
| Document | /v1/documents/ocr-7fK3x/document_a3f8b2.json |
| Page 1 image | /v1/documents/ocr-7fK3x/pages/1/image.png |
| Page 1 markdown | /v1/documents/ocr-7fK3x/pages/1/document_a3f8b2.md |
| Page 1 (flat) | /v1/documents/ocr-7fK3x/pg_1.png |
| All Tables | /v1/documents/ocr-7fK3x/entities/tables/document_a3f8b2.json |
| Table 0 (JSON) | /v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.json |
| Table 0 (CSV) | /v1/documents/ocr-7fK3x/entities/tables/0/document_a3f8b2.csv?format=csv |
| All Figures | /v1/documents/ocr-7fK3x/entities/figures/document_a3f8b2.json |
There is no dedicated /thumbnail endpoint. A thumbnail is just page 1: d.pg[1].png().
Use in React / JSX
import { doc } from '@okrapdf/runtime';
function DocumentCard({ docId }: { docId: string }) {
const d = doc(docId);
return (
<article>
<img src={d.pg[1].png()} alt="Document cover" />
<h2>Report</h2>
<ul>
<li><a href={d.pages[1].image.url()}>View page 1</a></li>
<li><a href={d.entities.tables[0].url({ format: 'csv' })}>Download table (CSV)</a></li>
<li><a href={d.url({ format: 'json' })}>Full JSON export</a></li>
</ul>
</article>
);
}
Use as og:image
import { doc } from '@okrapdf/runtime';
export function generateMetadata({ params }) {
const d = doc(params.docId);
return {
openGraph: {
images: [d.pg[1].png()],
},
};
}
PDF lead magnet landing page
Upload a whitepaper, get a landing page with zero backend work:
const session = await okra.sessions.create('whitepaper.pdf', { wait: true });
await session.publish();
const d = doc(session.id);
// Everything you need for a lead magnet page:
const ogImage = d.pg[1].png(); // social preview
const heroPreview = d.pages[1].image.url(); // hero section
const tableDownload = d.entities.tables[0].url({ format: 'csv' }); // CTA
Like Cloudinary’s /w_300/image.jpg, OkraPDF uses /t_{provider}/ to select the extraction backend. Same document, different output:
import { doc } from '@okrapdf/runtime';
// Default extraction
const d = doc('ocr-7fK3x');
d.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/pages/1/markdown
// LlamaParse extraction
const llama = doc('ocr-7fK3x', { provider: 'llamaparse' });
llama.pages[1].markdown.url()
// → /v1/documents/ocr-7fK3x/t_llamaparse/pages/1/markdown
// Compare side-by-side
doc('ocr-7fK3x', { provider: 'googleocr' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'docling' }).pages[1].url()
doc('ocr-7fK3x', { provider: 'unstructured' }).pages[1].url()
The provider segment is router-only — stripped before forwarding to the document. It tells the system which extraction backend to read from.
Default image placeholders
When a page image isn’t rendered yet, the API returns 404. Add a /d_{type}/ segment to serve an inline SVG placeholder instead — the browser shows it immediately and retries on next load (5s cache).
// Shimmer placeholder (animated loading effect)
d.pages[0].image.url()
// with d_ → /v1/documents/ocr-7fK3x/d_shimmer/pages/0/image
// Solid color placeholder
// → /v1/documents/ocr-7fK3x/d_color:e2e8f0/pages/0/image
| Type | Description |
|---|
d_shimmer | Animated gradient shimmer (like skeleton loading) |
d_color:{hex} | Solid color rectangle, e.g. d_color:3b82f6 for blue |
Behavior:
- R2 hit (image exists) → real PNG,
Cache-Control: immutable
- R2 miss (not ready) → SVG placeholder,
Cache-Control: max-age=5, X-Okra-Placeholder: true header
- No
d_ segment → 404 on miss (current behavior)
Works with all URL features — combine with t_ providers, artifact slugs, etc:
/v1/documents/ocr-7fK3x/t_llamaparse/d_shimmer/pages/1/image/report_a3f8b2.png
Use in <img> tags directly — shimmer renders while the real image processes:
<img src="https://api.okrapdf.com/v1/documents/ocr-7fK3x/d_shimmer/pages/0/image/doc.png" alt="Page 1" />
Artifact slugs
When fileName is provided, URLs get cache-friendly, human-readable suffixes:
const d = doc('ocr-7fK3x', { fileName: 'quarterly-report.pdf' });
d.url()
// → /v1/documents/ocr-7fK3x/quarterly-report_a3f8b2.json
d.pages[1].image.url()
// → /v1/documents/ocr-7fK3x/pages/1/image/quarterly-report_a3f8b2.png
d.entities.tables[0].url({ format: 'csv' })
// → /v1/documents/ocr-7fK3x/entities/tables/0/quarterly-report_a3f8b2.csv?format=csv
The suffix is a 6-character FNV-1a hash of the document ID. You can also provide a custom suffix:
const d = doc('ocr-7fK3x', { fileName: 'report.pdf', suffix: 'xyz789' });
Artifact slugs are cosmetic — the router strips them before forwarding. They make URLs mime-explicit and shareable.
Browser usage
Use the URL builder in browser code without Node.js:
<script type="module">
import { doc } from 'https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js';
const d = doc('ocr-7fK3x');
document.querySelector('#cover').src = d.pg[1].png();
document.querySelector('#csv-link').href = d.entities.tables[0].url({ format: 'csv' });
</script>
Or use the global export:
<script src="https://cdn.jsdelivr.net/npm/@okrapdf/runtime/dist/browser.js"></script>
<script>
const d = window.OkraRuntime.doc('ocr-7fK3x');
console.log(d.pg[1].md());
</script>
The browser bundle exports only doc() — no Node.js dependencies. For data fetching in the browser, use the generated URLs with fetch() directly.
Custom base URL
For self-hosted deployments:
const d = doc('ocr-7fK3x', 'https://docs.mycompany.com');
d.pg[1].png(); // https://docs.mycompany.com/v1/documents/ocr-7fK3x/pg_1.png
Private vs public URLs
URLs always resolve to the same path. Access depends on whether the document is published:
- Private (default): URLs return 404 without API key
- Published: URLs are publicly accessible, CDN-cacheable
// Make URLs public
const session = okra.sessions.from(docId);
await session.publish();
// Now all doc(docId).*.url() paths return content without auth
App reader URL (unchanged)
If you’re linking to Okra’s hosted reader UI, keep using:
https://app.okrapdf.com/ocr/{canonicalId}/reader