Skip to content

Routing & failover

CloakAPI routes each request through a small chain of decisions:

  1. Capability lookup — does the tenant’s API key allow the requested model? If not, return 403.
  2. BYOK preference — if the tenant has connected their own provider credentials for this model, use those. We never see the upstream provider key in plaintext (decrypted in-memory only for the duration of one request).
  3. Arbitrage — if multiple providers can serve the same model class, pick the cheapest one currently passing health checks.
  4. Failover — if the primary provider returns 5xx, 429, or exceeds its budget, the request transparently retries on the next provider. The receipt records the chain.
  5. Streaming bridge — for stream: true, the gateway proxies the SSE stream while computing the receipt over the final accumulated content.

Failover guarantees

  • Idempotent ops (chat completions, embeddings) retry up to 3 providers with linear backoff.
  • Non-idempotent ops (image generation, billable side-effects) failover only on connect-level errors, never on 5xx after the first byte was received.
  • Receipt continuity — even on failover, the chain seq increments by exactly 1.

Provider list

GET /api/v1/providers/routing returns the live routing table for your tenant. Sample:

{
"models": {
"gpt-4o": [
{"provider": "byok-openai", "weight": 1.0, "healthy": true},
{"provider": "azure-openai-eu-west", "weight": 0.5, "healthy": true},
{"provider": "openai-direct", "weight": 0.1, "healthy": true}
]
}
}