Manifesto

Thesis

Production AI is not a model.

A model is replaceable. A platform is compounding.

If product teams call different providers directly, you do not have an AI platform. You have distributed liability: cost, latency, quality, safety and incidents scattered across services.

After The Demo

The demo proves that a model can answer.

Production proves something else:

the answer is stable on real data;
latency fits the product workflow;
cost is visible before launch;
quality is measured before rollout;
fallback works before an incident;
ownership is clear before escalation.

This is where AI stops being a model call and becomes a platform problem.

The Model Is One Component

A new model is not just a new model_name.

It is a release with a quality gate, regression risk, fallback plan, cost profile and observability. Without that, the team changes hope, not the platform.

What I Mean By Platform

A platform is the stable contract between product and model execution:

AI Gateway and a unified API layer;
aliases, routing, quotas and fallback;
inference runtime for LLM, STT, embeddings and rerankers;
prompt, prefix and KV-cache;
evals and quality gate;
observability, cost and feedback;
guardrails, policy and audit;
ownership, runbooks and incident process.

What This Handbook Is Not

It is not a monthly model ranking, a generic prompt engineering guide or agent magic.

It is about one engineering question: how to make an AI scenario measurable, operable, safe and economically legible in production.

Principles

API key is not a platform.
Benchmark is not a quality process.
Cache is not enabled by one checkbox.
Context window is not working context. The ability to put one million tokens into a model does not mean those tokens are useful for quality, latency or cost.
Guardrails are policy, telemetry and ownership, not a library.
Cost per token is a weak metric. Cost per accepted outcome matters more.
An agent in production is a controlled loop with budget, policy, observability and fallback.
A platform team should not approve every prompt. It should provide contracts, tools and golden paths.

A mature AI platform treats context as a resource: choose what to load, where to place dynamic data, what to cache and when not to start the expensive path.

Minimum Check

✓Model replacement is possible without product rewrites.
✓Every production scenario has quality and cost telemetry.
✓Every release has rollback or fallback.
✓Guardrails and observability exist before rollout.
✓The scenario has a product owner and a platform owner.

Manifesto

On this page