> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Redact & Deploy

> Parse a PDF, apply server-side redaction, and get 3 URLs with different permission levels.

## Overview

Deploy a document with redaction config and get back 3 URLs — same underlying document, different permission levels. Server-side redaction means PII never reaches the browser.

```
view.okrapdf.com/s/{admin-token}/fw9.md   → full text, all PII visible
view.okrapdf.com/s/{viewer-token}/fw9.md  → PII replaced with [EMAIL], [PHONE], ***-**-****
view.okrapdf.com/s/{public-token}/fw9.md  → only allowlisted sections, PII redacted
```

Each URL is an HMAC-signed token that encodes the document ID and role. The filename (`fw9.md`) is decorative — the token is the auth.

## Install

```bash theme={null}
npm install @okrapdf/edge-kit
```

## End-to-end example

Parse a PDF with LlamaParse, then deploy with redaction:

```typescript theme={null}
import { LlamaCloud } from '@llamaindex/llama-cloud';
import { deploy } from '@okrapdf/edge-kit';
import type { PageInput } from '@okrapdf/edge-kit';

// 1. Parse PDF via LlamaParse (or any vendor)
const client = new LlamaCloud({ apiKey: process.env.LLAMAINDEX_API_KEY });
const parseResult = await client.parsing.parse({
  source_url: 'https://www.irs.gov/pub/irs-pdf/fw9.pdf',
  tier: 'cost_effective',
  version: 'latest',
  expand: ['items', 'markdown'],
}, { verbose: true });

// 2. Convert vendor output → vendor-agnostic PageInput
const pages: PageInput[] = [];
for (const page of parseResult.markdown?.pages ?? []) {
  if (!('markdown' in page)) continue;
  pages.push({ pageNum: page.page_number, text: page.markdown });
}

// 3. Configure PII detection
const pii = {
  preset: 'hipaa',
  patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'],
  includeNames: true,
  includeAddresses: true,
};

// 4. Deploy with redaction config → get 3 URLs
const result = await deploy({
  pages,
  meta: { title: 'IRS W-9 (Rev. 3-2024)', filename: 'fw9.pdf' },
  redact: {
    pii,
    publicFieldAllowlist: ['Form W-9', 'Part I', 'Part II', 'General Instructions'],
  },
  apiKey: process.env.OKRA_API_KEY!,
});

console.log(result.urls.admin);   // full text
console.log(result.urls.viewer);  // PII redacted
console.log(result.urls.public);  // allowlist only
console.log(result.stats);        // { totalMatches: 5, pagesAffected: 2, byRule: { SSN: 1, EMAIL: 2, PHONE_US: 2 } }
```

## What gets redacted

The `pii` config uses [OpenRedaction](https://github.com/sam247/openredaction) — compliance presets, name/address detection, and 400+ pattern types. Pick a preset or list specific patterns:

```typescript theme={null}
// Preset-based (HIPAA, GDPR, CCPA)
const pii = { preset: 'hipaa', includeNames: true };

// Pattern-based
const pii = { patterns: ['SSN', 'EMAIL', 'PHONE_US', 'TAX_ID_US'] };

// Combined
const pii = { preset: 'hipaa', patterns: ['TAX_ID_US'], includeAddresses: true };
```

No `pii` field? Uses OpenRedaction defaults (all patterns enabled).

## Three roles

| Role     | What they see                           | Use case                         |
| -------- | --------------------------------------- | -------------------------------- |
| `admin`  | Full text, all PII visible              | Internal review, compliance team |
| `viewer` | PII replaced with placeholders          | External auditors, partners      |
| `public` | Only allowlisted sections, PII redacted | Public-facing links, embedding   |

## Custom patterns

For domain-specific patterns, pass `customPatterns` with raw regex alongside presets:

```typescript theme={null}
const result = await deploy({
  pages,
  apiKey: process.env.OKRA_API_KEY!,
  redact: {
    pii: {
      preset: 'hipaa',
      customPatterns: [
        { type: 'ACCOUNT_NUM', regex: /ACC-\d{8}/g, priority: 10, placeholder: '[ACCOUNT_{n}]', severity: 'high' },
        { type: 'INTERNAL_REF', regex: /REF-[A-Z]{3}-\d{4}/g, priority: 5, placeholder: '[REF_{n}]', severity: 'medium' },
      ],
    },
    publicFieldAllowlist: ['Summary', 'Terms'],
  },
});
```

## URL anatomy

```
view.okrapdf.com / s / {token} / {filename}.md
                   │    │          │
                   │    │          └─ decorative (human-readable, not used for lookup)
                   │    └─ HMAC-signed: base64(docId:role).signature
                   └─ "shared/governed" route prefix
```

The token is verified server-side with HMAC-SHA256. Tampering with the role or document ID invalidates the signature.

## Response format

URLs return `Content-Type: text/markdown; charset=utf-8`. The response is the document's markdown with redaction already applied — no client-side processing needed.

```bash theme={null}
curl https://view.okrapdf.com/s/{viewer-token}/fw9.md

# Form W-9
# Request for Taxpayer Identification Number
Name: John Doe
SSN: ***-**-****
Email: [EMAIL]
Phone: [PHONE]
```

## Redaction applies everywhere

Static URLs are just the beginning. The same redaction lens applies to **every access path** — completions, agent SQL queries, and text search. The LLM never sees raw PII.

### Completions endpoint

When a consumer hits the public `/completion` endpoint, the agent's tool results are redacted before the LLM sees them:

```
POST /v1/documents/fw9-a3f8b2/completion
{ "prompt": "Who filed this W-9?" }
```

The response only contains redacted content — the model literally cannot leak PII because it never received it.

### Agent SQL queries

The DocumentAgent has a `query_sql` tool that runs SELECT queries against the document's local SQLite. Redaction intercepts the tool result before it's fed back to the LLM:

```typescript theme={null}
// 1. The LLM decides to call query_sql
const sqlQuery = llmResponse.tool_calls[0].function.arguments.query;

// 2. The DO runs the query against the RAW data
const rawResult = await this.state.storage.sql(sqlQuery);
// => [{ name: "John Doe", ssn: "123-45-6789", email: "john@acme.com" }]

// 3. Apply the redaction lens to the tool result
const safeResult = rawResult.map(row => {
  let safe = { ...row };
  if (activeLens !== 'admin') {
    if (safe.ssn) safe.ssn = '[REDACTED]';
    if (safe.email) safe.email = '[EMAIL]';
    if (safe.phone) safe.phone = '[PHONE]';
  }
  return safe;
});

// 4. Return the SAFE result back to the LLM as a tool message
messages.push({
  role: 'tool',
  tool_call_id: tool.id,
  content: JSON.stringify(safeResult),
  // The LLM only sees: [{ name: "John Doe", ssn: "[REDACTED]", email: "[EMAIL]" }]
});
```

The LLM reasons over redacted data. It can still answer "who filed this?" (name wasn't redacted) but cannot surface the SSN or email.

### Text search

Full-text search results go through the same lens. A search for "123-45" against a viewer-role token returns zero matches — the redacted content doesn't contain the raw pattern.

### Why this matters

Most redaction tools only protect static exports. OkraPDF redacts at the **data access layer** — every SELECT, every completion, every search result passes through the lens. The blast radius of a leaked token is bounded by its role, not by which endpoint was called.

## Local extraction with Docling

Want to parse PDFs locally so your document bytes never touch a third-party cloud? See the [Local Extraction + Redaction (Docling)](/cookbook/local-redact-docling) cookbook.
