Documentation Index
Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Deploy a document with redaction config and get back 3 URLs — same underlying document, different permission levels. Server-side redaction means PII never reaches the browser.
view.okrapdf.com/s/{admin-token}/fw9.md → full text, all PII visible
view.okrapdf.com/s/{viewer-token}/fw9.md → PII replaced with [EMAIL], [PHONE], ***-**-****
view.okrapdf.com/s/{public-token}/fw9.md → only allowlisted sections, PII redacted
Each URL is an HMAC-signed token that encodes the document ID and role. The filename (fw9.md) is decorative — the token is the auth.
Install
npm install @okrapdf/edge-kit
End-to-end example
Parse a PDF with LlamaParse, then deploy with redaction:
import { LlamaCloud } from '@llamaindex/llama-cloud';
import { deploy } from '@okrapdf/edge-kit';
import type { PageInput } from '@okrapdf/edge-kit';
// 1. Parse PDF via LlamaParse (or any vendor)
const client = new LlamaCloud({ apiKey: process.env.LLAMAINDEX_API_KEY });
const parseResult = await client.parsing.parse({
source_url: 'https://www.irs.gov/pub/irs-pdf/fw9.pdf',
tier: 'cost_effective',
version: 'latest',
expand: ['items', 'markdown'],
}, { verbose: true });
// 2. Convert vendor output → vendor-agnostic PageInput
const pages: PageInput[] = [];
for (const page of parseResult.markdown?.pages ?? []) {
if (!('markdown' in page)) continue;
pages.push({ pageNum: page.page_number, text: page.markdown });
}
// 3. Configure PII detection
const pii = {
preset: 'hipaa',
patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
includeNames: true,
includeAddresses: true,
};
// 4. Deploy with redaction config → get 3 URLs
const result = await deploy({
pages,
meta: { title: 'IRS W-9 (Rev. 3-2024)', filename: 'fw9.pdf' },
redact: {
pii,
publicFieldAllowlist: ['Form W-9', 'Part I', 'Part II', 'General Instructions'],
},
apiKey: process.env.OKRA_API_KEY!,
});
console.log(result.urls.admin); // full text
console.log(result.urls.viewer); // PII redacted
console.log(result.urls.public); // allowlist only
console.log(result.stats); // { totalMatches: 5, pagesAffected: 2, byRule: { SSN: 1, EMAIL: 2, PHONE_US: 2 } }
What gets redacted
The pii config uses OpenRedaction — compliance presets, name/address detection, and 400+ pattern types. Pick a preset or list specific patterns:
// Preset-based (HIPAA, GDPR, CCPA)
const pii = { preset: 'hipaa', includeNames: true };
// Pattern-based
const pii = { patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'] };
// Combined
const pii = { preset: 'hipaa', patterns: ['TAX_ID_US'], includeAddresses: true };
No pii field? Uses OpenRedaction defaults (all patterns enabled).
Three roles
| Role | What they see | Use case |
|---|
admin | Full text, all PII visible | Internal review, compliance team |
viewer | PII replaced with placeholders | External auditors, partners |
public | Only allowlisted sections, PII redacted | Public-facing links, embedding |
Custom patterns
For domain-specific patterns, pass customPatterns with raw regex alongside presets:
const result = await deploy({
pages,
apiKey: process.env.OKRA_API_KEY!,
redact: {
pii: {
preset: 'hipaa',
customPatterns: [
{ type: 'ACCOUNT_NUM', regex: /ACC-\d{8}/g, priority: 10, placeholder: '[ACCOUNT_{n}]', severity: 'high' },
{ type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
],
},
publicFieldAllowlist: ['Summary', 'Terms'],
},
});
URL anatomy
view.okrapdf.com / s / {token} / {filename}.md
│ │ │
│ │ └─ decorative (human-readable, not used for lookup)
│ └─ HMAC-signed: base64(docId:role).signature
└─ "shared/governed" route prefix
The token is verified server-side with HMAC-SHA256. Tampering with the role or document ID invalidates the signature.
URLs return Content-Type: text/markdown; charset=utf-8. The response is the document’s markdown with redaction already applied — no client-side processing needed.
curl https://view.okrapdf.com/s/{viewer-token}/fw9.md
# Form W-9
# Request for Taxpayer Identification Number
Name: John Doe
SSN: ***-**-****
Email: [EMAIL]
Phone: [PHONE]
Redaction applies everywhere
Static URLs are just the beginning. The same redaction lens applies to every access path — completions, agent SQL queries, and text search. The LLM never sees raw PII.
Completions endpoint
When a consumer hits the public /completion endpoint, the agent’s tool results are redacted before the LLM sees them:
POST /v1/documents/fw9-a3f8b2/completion
{ "prompt": "Who filed this W-9?" }
The response only contains redacted content — the model literally cannot leak PII because it never received it.
Agent SQL queries
The DocumentAgent has a query_sql tool that runs SELECT queries against the document’s local SQLite. Redaction intercepts the tool result before it’s fed back to the LLM:
// 1. The LLM decides to call query_sql
const sqlQuery = llmResponse.tool_calls[0].function.arguments.query;
// 2. The DO runs the query against the RAW data
const rawResult = await this.state.storage.sql(sqlQuery);
// => [{ name: "John Doe", ssn: "123-45-6789", email: "john@acme.com" }]
// 3. Apply the redaction lens to the tool result
const safeResult = rawResult.map(row => {
let safe = { ...row };
if (activeLens !== 'admin') {
if (safe.ssn) safe.ssn = '[REDACTED]';
if (safe.email) safe.email = '[EMAIL]';
if (safe.phone) safe.phone = '[PHONE]';
}
return safe;
});
// 4. Return the SAFE result back to the LLM as a tool message
messages.push({
role: 'tool',
tool_call_id: tool.id,
content: JSON.stringify(safeResult),
// The LLM only sees: [{ name: "John Doe", ssn: "[REDACTED]", email: "[EMAIL]" }]
});
The LLM reasons over redacted data. It can still answer “who filed this?” (name wasn’t redacted) but cannot surface the SSN or email.
Text search
Full-text search results go through the same lens. A search for “123-45” against a viewer-role token returns zero matches — the redacted content doesn’t contain the raw pattern.
Why this matters
Most redaction tools only protect static exports. OkraPDF redacts at the data access layer — every SELECT, every completion, every search result passes through the lens. The blast radius of a leaked token is bounded by its role, not by which endpoint was called.
Want to parse PDFs locally so your document bytes never touch a third-party cloud? See the Local Extraction + Redaction (Docling) cookbook.