> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Redaction

> Server-side PII redaction with role-based access. Same document, different views.

## Same document, three URLs

OkraPDF's redaction engine runs server-side at the edge. PII is removed before the response leaves the Worker — it never reaches the browser.

```
/s/{admin-token}/fw9.md   → full text
/s/{viewer-token}/fw9.md  → SSN: ***-**-****, [EMAIL], [PHONE]
/s/{public-token}/fw9.md  → allowlisted sections only
```

Each URL is an HMAC-signed capability token. No API keys, no sessions, no cookies. The token IS the auth.

## How it works

1. **Parse** your PDF with any vendor (LlamaParse, Docling, Unstructured, Azure Doc Intel)
2. **Deploy** with `@okrapdf/edge-kit` — pass pages + redaction config
3. **Get back 3 URLs** — admin, viewer, public

```typescript theme={null}
import { deploy } from '@okrapdf/edge-kit';

const pii = {
  preset: 'hipaa',
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

const result = await deploy({
  pages,  // from any PDF parser
  meta: { title: 'W-9', filename: 'fw9.pdf' },
  redact: {
    pii,
    publicFieldAllowlist: ['Form W-9', 'General Instructions'],
  },
  apiKey: process.env.OKRA_API_KEY!,
});

result.urls.admin   // full text
result.urls.viewer  // PII redacted
result.urls.public  // allowlist + redacted
```

## PII detection with OpenRedaction

Pass a `pii` config object and the SDK uses [OpenRedaction](https://github.com/sam247/openredaction) under the hood — compliance presets, name/address detection, context-aware matching, and 400+ pattern types out of the box. No `pii` field uses OpenRedaction defaults (all patterns enabled).

```typescript theme={null}
const pii = {
  preset: 'hipaa',                                   // or 'gdpr', 'ccpa'
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};
```

## Custom patterns

For domain-specific patterns, pass `customPatterns` — raw regex alongside presets:

```typescript theme={null}
const pii = {
  preset: 'hipaa',
  customPatterns: [
    { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
    { type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
  ],
};
```

## Config-per-document

Each document gets its own redaction config. No global settings to manage.

```typescript theme={null}
// Tax forms: HIPAA preset, names + addresses
await deploy({
  pages: w9Pages,
  redact: {
    pii: { preset: 'hipaa', includeNames: true, includeAddresses: true },
    publicFieldAllowlist: ['Form W-9', 'Part I'],
  },
  apiKey,
});

// Contracts: custom patterns for deal values
await deploy({
  pages: contractPages,
  redact: {
    pii: {
      customPatterns: [
        { type: 'DEAL_VALUE', regex: /\$[\d,]+\.\d{2}/g, priority: 10, placeholder: '[AMOUNT_{n}]', severity: 'high' },
      ],
    },
    publicFieldAllowlist: ['Terms', 'Parties'],
  },
  apiKey,
});
```

## Vendor-agnostic

The `PageInput` format works with any parser:

```typescript theme={null}
interface PageInput {
  pageNum: number;
  text: string;
  items?: Array<{ text: string; bbox?: { x: number; y: number; w: number; h: number } }>;
}
```

No vendor lock-in. Parse with LlamaParse today, switch to Docling tomorrow — redaction works the same.

## Every access path, not just URLs

Redaction isn't just for static markdown URLs. The same lens applies to:

* **Completions endpoint** — agent tool results are redacted before the LLM sees them. The model can't leak PII it never received.
* **Agent SQL queries** — `query_sql` results pass through the lens. A `SELECT * FROM nodes` returns `[REDACTED]` for PII fields.
* **Text search** — search results are filtered through the active role. Searching for a raw SSN against a viewer token returns zero matches.

See the [Redact & Deploy cookbook](/cookbook/redact-deploy#redaction-applies-everywhere) for implementation details.

## Architecture

* **Redaction runs on Cloudflare Workers** — sub-5ms, no cold starts
* **Pages stored in R2** — zero egress fees
* **HMAC-signed tokens** — no database lookup needed to verify
* **Markdown output** — `Content-Type: text/markdown`, no HTML rendering overhead

See the [Redact & Deploy cookbook](/cookbook/redact-deploy) for a full working example.
