Sergei Notevskii
Gateway
Русская версия

AI Gateway

Why an AI Gateway is the control plane of a production AI platform.

Applied
v0.1
Updated May 23, 2026
AI Platform Leads
Staff Engineers
Backend Engineers
ai-gateway
routing
quotas
model-lifecycle
Saved only in this browser.

Problem

If product teams call different models directly, you do not have a platform. You have distributed liability.

Symptoms

  • No shared model aliases.
  • No common token accounting.
  • No policy enforcement point.
  • Provider failover is implemented differently in every product.

Mental model

AI Gateway is not a proxy to a model. It is the control plane for contracts, routing, quotas, cost, policy, observability and lifecycle.

Architecture

Core responsibilities: auth, tenants, quotas, rate limits, request normalization, provider abstraction, fallbacks, retries, model aliases, token accounting, cost attribution and policy enforcement.

In a mature platform, the gateway routes not only provider or model, but execution type. Simple requests can go to direct_small, complex single-shot tasks to direct_large, and requests with actions, external state or multi-step planning to agentic.

Route policy

01

Scenario

Scenario and success criterion.

02

Route policy

lane, SLO, budget, fallback.

03

Pool / provider

MaaS, self-hosted pool or batch.

04

Trace

route_lane, pool_id, queue_time, cost.

Metrics

Gateway coverage, route success rate, fallback rate, quota rejections, cost attribution completeness, cache hit by route, latency by model alias and policy violation rate.

Trade-offs

A gateway adds a hop and a platform dependency. In return it creates a single place to improve routing, safety, cost and observability for every product.

Anti-patterns

  • SDK wrappers without routing state.
  • Model names exposed directly to product code.
  • Every request goes to the agentic runtime by default.
  • Router chooses model_name, but not execution path.
  • Route metadata is inserted into the prompt prefix and breaks prefix cache.
  • Retries that hide quality or cost failures.
  • Quotas enforced only after the provider invoice arrives.

Checklist

  • Product code uses model aliases, not raw provider model names.
  • Every request carries scenario_id and owner metadata.
  • Gateway records prompt version, model, provider, tokens, cost and latency.
  • Fallbacks have stop criteria and are visible in traces.
  • Policy enforcement happens before tool execution or external side effects.

Example

Instead of gpt-5-mini in product code, use crm.summary.fast. The gateway can route it to a managed provider, self-hosted model or fallback while keeping telemetry and quality ownership stable.

Decision template

For each route define: alias, use case, route_policy, canary_share, fallback_condition, pool_id, queue_priority, latency_class, allowed providers, quota, token budget, safety policy, trace fields and owner.

On this page