The problem
Talking to a PDF normally means stitching together a pipeline:- Download the file (handle redirects, auth, rate limits)
- Parse it (OCR, layout detection, table extraction)
- Chunk and embed the content
- Build a system prompt within token limits
- Send to an LLM and manage multi-turn state
The shortcut
The Resolve endpoint collapses this into a single POST:- Detect the PDF — rewrites
/abs/to/pdf/for arXiv automatically - Download and parse with OCR + layout analysis
- Deduplicate — same URL from the same tenant reuses the existing document
- Run the completion against the parsed content
- Return an OpenAI-compatible response
How wait_ms works
The wait_ms parameter controls how long the server waits for ingestion before responding:
| Value | Behavior |
|---|---|
0 | Return immediately — 202 if still processing, 200 if already indexed |
3000 | Wait up to 3 seconds, then 200 or 202 |
30000 | Wait up to 30 seconds (good default for most PDFs) |
| Omitted | Uses server default (30s) |
> 120000 | Clamped to 120s max |
Handling 202 (still processing)
If the document hasn’t finished parsing, you get a 202:
status_url until the phase is complete, then retry your original request:
Streaming
Swap the endpoint to get a streaming response:Multi-turn follow-ups
Include prior messages for follow-up questions — same source URL reuses the document:Source types
The resolve endpoint supports three source types:URL source
arxiv.org, hkexnews.hk (more coming).
Filing source
Public source
Using with the OpenAI SDK
The response shape is OpenAI-compatible, so you can use it as a drop-in with any OpenAI SDK consumer:Error handling
| Status | Code | Meaning |
|---|---|---|
400 | INVALID_SOURCE | Missing or malformed source object |
401 | — | Missing or invalid API key |
404 | FILING_NOT_READY | Filing not found or not indexed |
404 | PUBLIC_SOURCE_NOT_FOUND | Public source missing or disabled |
422 | UNSUPPORTED_SOURCE_URL | URL domain not in allowlist |
502 | URL_INGEST_START_FAILED | Failed to start document ingestion |