> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Beyond RAG

> Why retrieval-augmented generation was a 2023 solution—and what comes next

## RAG Was a Workaround

Retrieval-Augmented Generation solved a real problem: LLMs have limited context windows, so we chunk documents, embed them, and retrieve relevant pieces at query time.

But RAG has fundamental limitations:

<AccordionGroup>
  <Accordion title="Chunking destroys structure" icon="scissors">
    Tables split across chunks. Context gets lost. The embedding of row 47 doesn't know about the header.
  </Accordion>

  <Accordion title="Retrieval is brittle" icon="magnifying-glass">
    Semantic similarity isn't the same as relevance. The most similar chunk isn't always the most useful one.
  </Accordion>

  <Accordion title="No reasoning across chunks" icon="brain">
    RAG retrieves, then generates. It can't iteratively explore a document like a human would.
  </Accordion>

  <Accordion title="Complex infrastructure" icon="server">
    Vector databases, embedding models, retrieval pipelines, re-ranking—all to approximate what "reading" should be.
  </Accordion>
</AccordionGroup>

## The New Approach: Let Agents Read

Modern AI agents don't need retrieval pipelines. They need **files they can actually read**.

Give an agent:

* A filesystem with your documents
* Tools to search and navigate
* A code interpreter

And it can do what RAG tried to do—but better, because it can reason about what to read next.

## The Catch: Format Matters

Agents read plaintext natively. But your documents aren't plaintext:

| Format                 | Agent Can Read? |
| ---------------------- | --------------- |
| `.md`, `.txt`, `.json` | Yes             |
| `.py`, `.tsx`, `.sql`  | Yes             |
| `.pdf`                 | **No**          |
| `.docx`, `.xlsx`       | **No**          |
| Scanned documents      | **No**          |

This is where OkraPDF comes in.

## OkraPDF: The Bridge

We convert documents from formats agents can't read into formats they can:

```
PDF (opaque binary)
    ↓ OkraPDF
Structured text + tables + figures (agent-ready)
```

But we don't stop at parsing. We give agents **tools to work with the data**:

<CardGroup cols={2}>
  <Card title="Parse" icon="file-import">
    OCR, table extraction, figure detection. Your PDF becomes structured data.
  </Card>

  <Card title="Search" icon="search">
    Semantic search across extracted entities. Find the right table in seconds.
  </Card>

  <Card title="Chat" icon="message">
    Ask questions. The agent has full context, not retrieved chunks.
  </Card>

  <Card title="Query" icon="terminal">
    Agent can run SQL queries against your document's structured data, not just retrieve chunks.
  </Card>
</CardGroup>

## Side-by-Side Comparison

| Capability            | RAG Pipeline                 | OkraPDF                   |
| --------------------- | ---------------------------- | ------------------------- |
| Setup time            | Days to weeks                | Minutes                   |
| Infrastructure        | Vector DB + embeddings + API | None (hosted)             |
| Table handling        | Chunks and hopes             | Structure preserved       |
| Multi-step reasoning  | Limited                      | Full agent loop           |
| SQL query access      | No                           | Yes (per-document SQLite) |
| Accuracy verification | Trust the retrieval          | Side-by-side review       |

## When to Use What

**Use RAG when:**

* You have millions of documents
* Latency is critical (sub-second)
* Queries are simple lookups

**Use OkraPDF when:**

* Document structure matters (tables, figures)
* You need to verify extraction accuracy
* Queries require reasoning across the document
* You want agents to compute on data, not just retrieve it

## Try It

<Steps>
  <Step title="Upload a PDF">
    Any document—financial report, research paper, invoice
  </Step>

  <Step title="See the extraction">
    Tables, figures, text—all preserved and searchable
  </Step>

  <Step title="Ask a question">
    Document Chat gives you answers grounded in actual data
  </Step>
</Steps>

<Card title="Get Started Free" icon="rocket" href="https://app.okrapdf.com/sign-up" horizontal>
  50 pages free. No credit card required.
</Card>
