Skip to content

Quotas & rate limiting

Quotas

Every (tenant, capability) pair has a monthly token budget. Requests that would exceed the budget receive 402 Payment Required with:

{
"error": {
"code": "quota_exceeded",
"capability": "gpt-4o",
"limit_tokens": 10000000,
"used_tokens": 10000000,
"reset_at": "2026-05-01T00:00:00Z"
}
}

Top up by adding credit in the portal under Admin → Billing.

Rate limits

Two scales, both sliding-window:

ScaleDefault per (tenant, capability)Header
60s60 req/minRateLimit-Limit-Minute
3600s1000 req/hrRateLimit-Limit-Hour

Public endpoints (/api/v1/receipts/verify, /.well-known/*) have their own per-IP limits (30 req/min for verify, 120 req/min for discovery).

Idempotency

Send Idempotency-Key: <uuid> with any POST request. The gateway caches the response for 24h keyed by (tenant, idempotency_key). Retries with the same key replay the same response — including the same receipt — and do not consume quota.

This is the recommended pattern for any production caller, not least because it makes failover safe to retry from the SDK.