AI Gateway

Problem

If product teams call different models directly, you do not have a platform. You have distributed liability.

Symptoms

No shared model aliases.
No common token accounting.
No policy enforcement point.
Provider failover is implemented differently in every product.

Mental model

AI Gateway is not a proxy to a model. It is the control plane for contracts, routing, quotas, cost, policy, observability and lifecycle.

Architecture

Core responsibilities: auth, tenants, quotas, rate limits, request normalization, provider abstraction, fallbacks, retries, model aliases, token accounting, cost attribution and policy enforcement.

In a mature platform, the gateway routes not only provider or model, but execution type. Simple requests can go to direct_small, complex single-shot tasks to direct_large, and requests with actions, external state or multi-step planning to agentic.

Route policy

Scenario

Scenario and success criterion.

Route policy

lane, SLO, budget, fallback.

Pool / provider

MaaS, self-hosted pool or batch.

Trace

route_lane, pool_id, queue_time, cost.

Metrics

Gateway coverage, route success rate, fallback rate, quota rejections, cost attribution completeness, cache hit by route, latency by model alias and policy violation rate.

Trade-offs

A gateway adds a hop and a platform dependency. In return it creates a single place to improve routing, safety, cost and observability for every product.

Anti-patterns

SDK wrappers without routing state.
Model names exposed directly to product code.
Every request goes to the agentic runtime by default.
Router chooses model_name, but not execution path.
Route metadata is inserted into the prompt prefix and breaks prefix cache.
Retries that hide quality or cost failures.
Quotas enforced only after the provider invoice arrives.

Checklist

✓Product code uses model aliases, not raw provider model names.
✓Every request carries scenario_id and owner metadata.
✓Gateway records prompt version, model, provider, tokens, cost and latency.
✓Fallbacks have stop criteria and are visible in traces.
✓Policy enforcement happens before tool execution or external side effects.

Example

Instead of gpt-5-mini in product code, use crm.summary.fast. The gateway can route it to a managed provider, self-hosted model or fallback while keeping telemetry and quality ownership stable.

Decision template

For each route define: alias, use case, route_policy, canary_share, fallback_condition, pool_id, queue_priority, latency_class, allowed providers, quota, token budget, safety policy, trace fields and owner.

AI Gateway

On this page