AI Gateway
Why an AI Gateway is the control plane of a production AI platform.
Problem
If product teams call different models directly, you do not have a platform. You have distributed liability.
Symptoms
- No shared model aliases.
- No common token accounting.
- No policy enforcement point.
- Provider failover is implemented differently in every product.
Mental model
AI Gateway is not a proxy to a model. It is the control plane for contracts, routing, quotas, cost, policy, observability and lifecycle.
Architecture
Core responsibilities: auth, tenants, quotas, rate limits, request normalization, provider abstraction, fallbacks, retries, model aliases, token accounting, cost attribution and policy enforcement.
In a mature platform, the gateway routes not only provider or model, but execution type. Simple requests can go to direct_small, complex single-shot tasks to direct_large, and requests with actions, external state or multi-step planning to agentic.
01
Scenario
Scenario and success criterion.
02
Route policy
lane, SLO, budget, fallback.
03
Pool / provider
MaaS, self-hosted pool or batch.
04
Trace
route_lane, pool_id, queue_time, cost.
Metrics
Gateway coverage, route success rate, fallback rate, quota rejections, cost attribution completeness, cache hit by route, latency by model alias and policy violation rate.
Trade-offs
A gateway adds a hop and a platform dependency. In return it creates a single place to improve routing, safety, cost and observability for every product.
Anti-patterns
- SDK wrappers without routing state.
- Model names exposed directly to product code.
- Every request goes to the agentic runtime by default.
- Router chooses
model_name, but not execution path. - Route metadata is inserted into the prompt prefix and breaks prefix cache.
- Retries that hide quality or cost failures.
- Quotas enforced only after the provider invoice arrives.
Checklist
- ✓Product code uses model aliases, not raw provider model names.
- ✓Every request carries scenario_id and owner metadata.
- ✓Gateway records prompt version, model, provider, tokens, cost and latency.
- ✓Fallbacks have stop criteria and are visible in traces.
- ✓Policy enforcement happens before tool execution or external side effects.
Example
Instead of gpt-5-mini in product code, use crm.summary.fast. The gateway can route it to a managed provider, self-hosted model or fallback while keeping telemetry and quality ownership stable.
Decision template
For each route define: alias, use case, route_policy, canary_share, fallback_condition, pool_id, queue_priority, latency_class, allowed providers, quota, token budget, safety policy, trace fields and owner.