# Rate Limits and Quotas

The API enforces usage limits to protect reliability and prevent runaway cost.

## What to expect

- `402` when quota/credits are insufficient
- `429` when request or concurrency limits are exceeded

## Recommended client behavior

- Enforce your own per-tenant limits before calling the API.
- Use queues to smooth spikes.
- Retry `429` with exponential backoff and jitter.
- Use idempotency keys to prevent duplicate work.

## Capacity planning checklist

- Track pages per document before upload.
- Track active job count per customer.
- Alert on sustained `402`/`429` responses.
- Add circuit breakers when upstream systems spike.

## Billing and quota signals

Treat `402 quota_exceeded` as an actionable product signal:

- prompt customer to buy credits or upgrade plan
- pause automatic retries
- resume once quota is restored
