AI Platform Lead

Sergei Notevskii

I build production-grade AI platforms for LLM, STT, embeddings and agents: inference, evals, guardrails, observability, cost and ownership.

From API key to platform
01API key / demo
02AI Gateway
03Routing / Inference / Cache
04Evals / Observability / Cost
05Guardrails / Ownership

Why this work exists

Public, sanitized field notes from production AI platform work.

Production AI platforms

LLM · STT · embeddings · agents

Self-hosted inference

vLLM · GPU capacity · routing

Quality systems

Evals · regression · feedback loops

Public field notes

Habr · Telegram · talks

After the demo

The demo works. Then production starts.

01

Latency spikes.

02

Token cost grows.

03

Prompts break.

04

Agents loop.

05

Evals are missing.

06

Nobody owns quality.

Platform layers

The handbook is organized by platform responsibility, not by hype cycle.

L01

Product use cases

Scenario intake, user value, risk profile, acceptance criteria.

L02

AI Gateway

Unified API layer for auth, quotas, routing, policy, cost attribution.

L03

Provider strategy

MaaS, OpenRouter-style research loops, self-hosted and hybrid decisions.

L04

Model routing

Aliases, fallback, canary and model versioning.

Open the full 12-layer map

Where I am useful

Architecture reviews, platform strategy, quality gates and inference economics.

AI Platform
Self-hosted inference
vLLM and GPU capacity
Model routing and fallback
Prefix cache economics
Evals and quality gates
LLM observability
Guardrails and ownership

Projects

The handbook is the flagship project. Tools and templates grow around it.

Writing

Public writing becomes chapters, checklists and tools inside the handbook.

Talks

Talks and conference material feed the strategy, inference and economics tracks.

Ways to work

Clear formats for talks, reviews and executive conversations.

About the author

Sergei Notevskii

I write Production AI Platform Handbook: a practical field guide for teams turning AI demos into production platforms.

Central sentence

The materials are public and sanitized: no internal details, but with production taste.

Read more

Start with the map

A model is replaceable. A platform is compounding.

The first release is intentionally small: map, maturity model, core platform layers and practical tools.

Open the map