Quotas & rate limiting
Quotas
Every (tenant, capability) pair has a monthly token budget. Requests that
would exceed the budget receive 402 Payment Required with:
{ "error": { "code": "quota_exceeded", "capability": "gpt-4o", "limit_tokens": 10000000, "used_tokens": 10000000, "reset_at": "2026-05-01T00:00:00Z" }}Top up by adding credit in the portal under Admin → Billing.
Rate limits
Two scales, both sliding-window:
| Scale | Default per (tenant, capability) | Header |
|---|---|---|
| 60s | 60 req/min | RateLimit-Limit-Minute |
| 3600s | 1000 req/hr | RateLimit-Limit-Hour |
Public endpoints (/api/v1/receipts/verify, /.well-known/*) have their
own per-IP limits (30 req/min for verify, 120 req/min for discovery).
Idempotency
Send Idempotency-Key: <uuid> with any POST request. The gateway caches
the response for 24h keyed by (tenant, idempotency_key). Retries with
the same key replay the same response — including the same receipt — and
do not consume quota.
This is the recommended pattern for any production caller, not least because it makes failover safe to retry from the SDK.