> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Extract

> Upload and extract text, tables, and figures from PDFs.

Upload, process, and extract from a PDF in one command.

```bash theme={null}
# From a local file
okra extract ./report.pdf

# From a URL
okra extract https://example.com/report.pdf

# Text only
okra extract ./report.pdf --text-only

# Tables only
okra extract ./report.pdf --tables-only

# JSON output (for piping)
okra extract ./report.pdf -o json

# Use an extraction template
okra extract ./report.pdf --template invoice

# Write agentic workspace to a directory
okra extract ./report.pdf -d ./workspace

# Include page images and figure crops
okra extract ./report.pdf -d ./workspace --images
```

## Flags

| Flag                      | Description                                                |
| ------------------------- | ---------------------------------------------------------- |
| `-o, --output <format>`   | Output format: `table`, `json`, `markdown`                 |
| `-d, --output-dir <path>` | Write agentic workspace to directory                       |
| `--images`                | Include page images and figure crops in workspace          |
| `--scale <n>`             | Image scale factor (1–4)                                   |
| `-t, --template <name>`   | Template: `invoice`, `receipt`, `financial-statement`      |
| `--processor <name>`      | Processor hint: `docai`, `gemini`, `qwen`, `llamaparse`    |
| `--ocr <engine>`          | OCR engine: `docai`, `tesseract`, `textract`, `azure-read` |
| `--vlm <model>`           | VLM model (e.g. `google/gemini-2.5-flash-preview-09-2025`) |
| `--tables-only`           | Only return extracted tables                               |
| `--text-only`             | Only return extracted text                                 |
| `--timeout <seconds>`     | Job timeout (default: 600)                                 |
| `--wait-for <stage>`      | Wait for pipeline stage: `ocr`, `entities`, `index`        |
| `-p, --prompt <task>`     | Add task context for piping to agent CLIs                  |
| `-q, --quiet`             | Minimal output (ideal for piping)                          |
