Sergei Notevskii
Русская версия

Manifesto

Why production AI should be framed as a platform, not a model choice.

Foundation
v0.1
Updated May 23, 2026
AI Platform Leads
CTOs
Engineering Managers
manifesto
platform
ownership
Saved only in this browser.

Thesis

Production AI is not a model.

A model is replaceable. A platform is compounding.

If product teams call different providers directly, you do not have an AI platform. You have distributed liability: cost, latency, quality, safety and incidents scattered across services.

After The Demo

The demo proves that a model can answer.

Production proves something else:

  • the answer is stable on real data;
  • latency fits the product workflow;
  • cost is visible before launch;
  • quality is measured before rollout;
  • fallback works before an incident;
  • ownership is clear before escalation.

This is where AI stops being a model call and becomes a platform problem.

The Model Is One Component

A new model is not just a new model_name.

It is a release with a quality gate, regression risk, fallback plan, cost profile and observability. Without that, the team changes hope, not the platform.

What I Mean By Platform

A platform is the stable contract between product and model execution:

  • AI Gateway and a unified API layer;
  • aliases, routing, quotas and fallback;
  • inference runtime for LLM, STT, embeddings and rerankers;
  • prompt, prefix and KV-cache;
  • evals and quality gate;
  • observability, cost and feedback;
  • guardrails, policy and audit;
  • ownership, runbooks and incident process.

What This Handbook Is Not

It is not a monthly model ranking, a generic prompt engineering guide or agent magic.

It is about one engineering question: how to make an AI scenario measurable, operable, safe and economically legible in production.

Principles

  • API key is not a platform.
  • Benchmark is not a quality process.
  • Cache is not enabled by one checkbox.
  • Context window is not working context. The ability to put one million tokens into a model does not mean those tokens are useful for quality, latency or cost.
  • Guardrails are policy, telemetry and ownership, not a library.
  • Cost per token is a weak metric. Cost per accepted outcome matters more.
  • An agent in production is a controlled loop with budget, policy, observability and fallback.
  • A platform team should not approve every prompt. It should provide contracts, tools and golden paths.

A mature AI platform treats context as a resource: choose what to load, where to place dynamic data, what to cache and when not to start the expensive path.

Minimum Check

  • Model replacement is possible without product rewrites.
  • Every production scenario has quality and cost telemetry.
  • Every release has rollback or fallback.
  • Guardrails and observability exist before rollout.
  • The scenario has a product owner and a platform owner.

On this page