Rate Limits and Quotas
The API enforces usage limits to protect reliability and prevent runaway cost.
What to expect
402when quota/credits are insufficient429when request or concurrency limits are exceeded
Recommended client behavior
- Enforce your own per-tenant limits before calling the API.
- Use queues to smooth spikes.
- Retry
429with exponential backoff and jitter. - Use idempotency keys to prevent duplicate work.
Capacity planning checklist
- Track pages per document before upload.
- Track active job count per customer.
- Alert on sustained
402/429responses. - Add circuit breakers when upstream systems spike.
Billing and quota signals
Treat 402 quota_exceeded as an actionable product signal:
- prompt customer to buy credits or upgrade plan
- pause automatic retries
- resume once quota is restored