Start Here

Problem

The demo works. Then the real questions start: who owns quality, where cost is visible, why latency drifts, how a model change ships, and how product teams avoid building local AI stacks in every service.

This page helps you choose the first path. Do not read the handbook linearly.

Quick Choice

Need the big picture -> Production AI Platform Map.
Need an executive maturity language -> AI Platform Maturity Model.
Choosing MaaS, self-hosted or hybrid -> MaaS vs Self-hosted.
Moving from MaaS to self-hosted -> MaaS vs Self-hosted, Inference Runtime, AI Quality Gate, Inference Economics, Observability.
Cost or latency is drifting -> Inference Economics and Prefix Cache.
An agent burns tokens without visible reason -> Semantic Router, Prefix Cache, Inference Economics and Observability.
Quality moves after a model or prompt change -> AI Quality Gate.
Debugging is blind -> LLM Observability Checklist.
Ownership is unclear -> Ownership and Operating Model.

Thinking About Moving From MaaS To Self-hosted?

Do not start with model choice or GPU choice. Start with the scenario: what data is involved, which SLA matters, what the traffic shape looks like, how quality will be evaluated and who will own operations.

Use this first review path:

Scenario: what exactly is moving?
Data: can the input be de-identified and stay on MaaS?
SLA: is the workload real-time, long-context or batch?
Quality: do evals exist before the migration?
Economics: did you count production, staging, test, debug, on-call and reserve capacity?

If these questions have no answers, self-hosted is not a strategy yet. It is an expensive experiment.

Before GPUs

If there are no evals and no cost baseline, start the self-hosted migration with a scenario document, not with GPU selection.

Mental model

Production AI platform is not one layer. It is the connection between product scenarios, gateway, routing, inference, cache, evals, observability, cost, guardrails and ownership.

Start with the current pain. If cost hurts, go to economics and cache. If quality hurts, go to evals. If product teams are fragmented, go to gateway and ownership.

Paths By Role

AI Platform Lead: map, Semantic Router, gateway, economics, prefix cache, observability, ownership.
Staff Engineer: inference runtime, prefix cache, context budget, tool stability, router evals.
CTO / Head of Engineering: maturity model, MaaS vs self-hosted, inference economics, operating model.
Product Engineer: start here, gateway, observability checklist, quality gate.

What To Read First

✓If there is no shared map, start with Platform Map.
✓If maturity language is missing, start with Maturity Model.
✓If MaaS vs self-hosted is disputed, start with the strategy chapter.
✓If cache does not help, start with Prefix Cache.
✓If model releases are risky, start with AI Quality Gate.

How To Apply It

Use each chapter as a review checklist. After reading, you should know:

which platform layer is involved;
which owner is missing;
which metrics matter;
what can break in production;
which next document or tool to open.

Example

A team says: "the model became too expensive." Do not start by swapping the model. Check route, retries, cached tokens, tool schema stability, accepted outcome rate and fallback events. That path usually finds the real cause faster.

Start Here

On this page