Manifesto
Why production AI should be framed as a platform, not a model choice.
Thesis
Production AI is not a model.
A model is replaceable. A platform is compounding.
If product teams call different providers directly, you do not have an AI platform. You have distributed liability: cost, latency, quality, safety and incidents scattered across services.
After The Demo
The demo proves that a model can answer.
Production proves something else:
- the answer is stable on real data;
- latency fits the product workflow;
- cost is visible before launch;
- quality is measured before rollout;
- fallback works before an incident;
- ownership is clear before escalation.
This is where AI stops being a model call and becomes a platform problem.
The Model Is One Component
A new model is not just a new model_name.
It is a release with a quality gate, regression risk, fallback plan, cost profile and observability. Without that, the team changes hope, not the platform.
What I Mean By Platform
A platform is the stable contract between product and model execution:
- AI Gateway and a unified API layer;
- aliases, routing, quotas and fallback;
- inference runtime for LLM, STT, embeddings and rerankers;
- prompt, prefix and KV-cache;
- evals and quality gate;
- observability, cost and feedback;
- guardrails, policy and audit;
- ownership, runbooks and incident process.
What This Handbook Is Not
It is not a monthly model ranking, a generic prompt engineering guide or agent magic.
It is about one engineering question: how to make an AI scenario measurable, operable, safe and economically legible in production.
Principles
- API key is not a platform.
- Benchmark is not a quality process.
- Cache is not enabled by one checkbox.
- Context window is not working context. The ability to put one million tokens into a model does not mean those tokens are useful for quality, latency or cost.
- Guardrails are policy, telemetry and ownership, not a library.
- Cost per token is a weak metric. Cost per accepted outcome matters more.
- An agent in production is a controlled loop with budget, policy, observability and fallback.
- A platform team should not approve every prompt. It should provide contracts, tools and golden paths.
A mature AI platform treats context as a resource: choose what to load, where to place dynamic data, what to cache and when not to start the expensive path.
Minimum Check
- ✓Model replacement is possible without product rewrites.
- ✓Every production scenario has quality and cost telemetry.
- ✓Every release has rollback or fallback.
- ✓Guardrails and observability exist before rollout.
- ✓The scenario has a product owner and a platform owner.