Routing & failover
CloakAPI routes each request through a small chain of decisions:
- Capability lookup — does the tenant’s API key allow the requested
model? If not, return
403. - BYOK preference — if the tenant has connected their own provider credentials for this model, use those. We never see the upstream provider key in plaintext (decrypted in-memory only for the duration of one request).
- Arbitrage — if multiple providers can serve the same model class, pick the cheapest one currently passing health checks.
- Failover — if the primary provider returns
5xx,429, or exceeds its budget, the request transparently retries on the next provider. The receipt records the chain. - Streaming bridge — for
stream: true, the gateway proxies the SSE stream while computing the receipt over the final accumulated content.
Failover guarantees
- Idempotent ops (chat completions, embeddings) retry up to 3 providers with linear backoff.
- Non-idempotent ops (image generation, billable side-effects)
failover only on connect-level errors, never on
5xxafter the first byte was received. - Receipt continuity — even on failover, the chain
seqincrements by exactly 1.
Provider list
GET /api/v1/providers/routing returns the live routing table for your
tenant. Sample:
{ "models": { "gpt-4o": [ {"provider": "byok-openai", "weight": 1.0, "healthy": true}, {"provider": "azure-openai-eu-west", "weight": 0.5, "healthy": true}, {"provider": "openai-direct", "weight": 0.1, "healthy": true} ] }}