> ## Documentation Index
> Fetch the complete documentation index at: https://docs.okrapdf.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Rate Limits

> API rate limits for OkraPDF.

## Limits

| Context         | Requests/min | Expensive ops/min |
| --------------- | ------------ | ----------------- |
| Authenticated   | 600          | 30                |
| Unauthenticated | 60           | 5                 |

"Expensive ops" include `POST /v1/documents` (uploads), `POST /v1/documents/{id}/structured-output`, and `POST /v1/documents/{id}/chat/completions`. All other endpoints count as standard requests.

## Rate limit headers

Every API response includes rate limit information:

| Header                  | Description                       |
| ----------------------- | --------------------------------- |
| `X-RateLimit-Limit`     | Max requests per window           |
| `X-RateLimit-Remaining` | Requests remaining                |
| `X-RateLimit-Reset`     | Unix timestamp when window resets |

## When rate limited

You receive a `429` response:

```json theme={null}
{
  "error": {
    "code": "RATE_LIMITED",
    "message": "Rate limit exceeded. Please try again later."
  }
}
```

The `Retry-After` header tells you how many seconds to wait.

## Best practices

1. **Check headers** before hitting limits -- monitor `X-RateLimit-Remaining`
2. **Use exponential backoff** on 429 responses
3. **Batch small PDFs** -- prefer one large job over many small ones
4. **Cache results** -- store extraction results locally after fetching

```python theme={null}
import time

def respect_rate_limit(response):
    remaining = int(response.headers.get("X-RateLimit-Remaining", 1))
    if remaining <= 1:
        reset = int(response.headers.get("X-RateLimit-Reset", 0))
        wait = max(0, reset - time.time())
        time.sleep(wait)
```
