Documentation Index
Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
Use this file to discover all available pages before exploring further.
Same document, three URLs
OkraPDF’s redaction engine runs server-side at the edge. PII is removed before the response leaves the Worker — it never reaches the browser.
/s/{admin-token}/fw9.md → full text
/s/{viewer-token}/fw9.md → SSN: ***-**-****, [EMAIL], [PHONE]
/s/{public-token}/fw9.md → allowlisted sections only
Each URL is an HMAC-signed capability token. No API keys, no sessions, no cookies. The token IS the auth.
How it works
- Parse your PDF with any vendor (LlamaParse, Docling, Unstructured, Azure Doc Intel)
- Deploy with
@okrapdf/edge-kit — pass pages + redaction config
- Get back 3 URLs — admin, viewer, public
import { deploy } from '@okrapdf/edge-kit';
const pii = {
preset: 'hipaa',
patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
includeNames: true,
includeAddresses: true,
};
const result = await deploy({
pages, // from any PDF parser
meta: { title: 'W-9', filename: 'fw9.pdf' },
redact: {
pii,
publicFieldAllowlist: ['Form W-9', 'General Instructions'],
},
apiKey: process.env.OKRA_API_KEY!,
});
result.urls.admin // full text
result.urls.viewer // PII redacted
result.urls.public // allowlist + redacted
PII detection with OpenRedaction
Pass a pii config object and the SDK uses OpenRedaction under the hood — compliance presets, name/address detection, context-aware matching, and 400+ pattern types out of the box. No pii field uses OpenRedaction defaults (all patterns enabled).
const pii = {
preset: 'hipaa', // or 'gdpr', 'ccpa'
patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
includeNames: true,
includeAddresses: true,
};
Custom patterns
For domain-specific patterns, pass customPatterns — raw regex alongside presets:
const pii = {
preset: 'hipaa',
customPatterns: [
{ type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
{ type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
],
};
Config-per-document
Each document gets its own redaction config. No global settings to manage.
// Tax forms: HIPAA preset, names + addresses
await deploy({
pages: w9Pages,
redact: {
pii: { preset: 'hipaa', includeNames: true, includeAddresses: true },
publicFieldAllowlist: ['Form W-9', 'Part I'],
},
apiKey,
});
// Contracts: custom patterns for deal values
await deploy({
pages: contractPages,
redact: {
pii: {
customPatterns: [
{ type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
],
},
publicFieldAllowlist: ['Terms', 'Parties'],
},
apiKey,
});
Vendor-agnostic
The PageInput format works with any parser:
interface PageInput {
pageNum: number;
text: string;
items?: Array<{ text: string; bbox?: { x: number; y: number; w: number; h: number } }>;
}
No vendor lock-in. Parse with LlamaParse today, switch to Docling tomorrow — redaction works the same.
Every access path, not just URLs
Redaction isn’t just for static markdown URLs. The same lens applies to:
- Completions endpoint — agent tool results are redacted before the LLM sees them. The model can’t leak PII it never received.
- Agent SQL queries —
query_sql results pass through the lens. A SELECT * FROM nodes returns [REDACTED] for PII fields.
- Text search — search results are filtered through the active role. Searching for a raw SSN against a viewer token returns zero matches.
See the Redact & Deploy cookbook for implementation details.
Architecture
- Redaction runs on Cloudflare Workers — sub-5ms, no cold starts
- Pages stored in R2 — zero egress fees
- HMAC-signed tokens — no database lookup needed to verify
- Markdown output —
Content-Type: text/markdown, no HTML rendering overhead
See the Redact & Deploy cookbook for a full working example.