Developer docs

Rate limits and quotas

Throughput controls and production client behavior.

Rate Limits and Quotas

The API enforces usage limits to protect reliability and prevent runaway cost.

What to expect

  • 402 when quota/credits are insufficient
  • 429 when request or concurrency limits are exceeded

Recommended client behavior

  • Enforce your own per-tenant limits before calling the API.
  • Use queues to smooth spikes.
  • Retry 429 with exponential backoff and jitter.
  • Use idempotency keys to prevent duplicate work.

Capacity planning checklist

  • Track pages per document before upload.
  • Track active job count per customer.
  • Alert on sustained 402/429 responses.
  • Add circuit breakers when upstream systems spike.

Billing and quota signals

Treat 402 quota_exceeded as an actionable product signal:

  • prompt customer to buy credits or upgrade plan
  • pause automatic retries
  • resume once quota is restored