Sergei Notevskii
I build production-grade AI platforms for LLM, STT, embeddings and agents: inference, evals, guardrails, observability, cost and ownership.
Why this work exists
Public, sanitized field notes from production AI platform work.
Production AI platforms
LLM · STT · embeddings · agents
Self-hosted inference
vLLM · GPU capacity · routing
Quality systems
Evals · regression · feedback loops
Public field notes
Habr · Telegram · talks
After the demo
The demo works. Then production starts.
Latency spikes.
Token cost grows.
Prompts break.
Agents loop.
Evals are missing.
Nobody owns quality.
Platform layers
The handbook is organized by platform responsibility, not by hype cycle.
Product use cases
Scenario intake, user value, risk profile, acceptance criteria.
AI Gateway
Unified API layer for auth, quotas, routing, policy, cost attribution.
Provider strategy
MaaS, OpenRouter-style research loops, self-hosted and hybrid decisions.
Model routing
Aliases, fallback, canary and model versioning.
Where I am useful
Architecture reviews, platform strategy, quality gates and inference economics.
Projects
The handbook is the flagship project. Tools and templates grow around it.
Production AI Platform Handbook
A platform responsibility map for teams moving from API key and demo to inference, routing, evals, cost and ownership.
Prefix Cache Auditor
A client-side diagnostic tool for unstable prefixes, dynamic fields, tool schema drift and cache-aware recommendations.
audit-prompt-caching
An open-source diagnostic package for prompt and prefix cache audits: stable layout, volatile fields and cache-aware recommendations.
AI Quality Gate Kit
A rollout readiness checklist for evals, regression, canary, feedback, fallback and production ownership.
Writing
Public writing becomes chapters, checklists and tools inside the handbook.
Habr
Short prompt does not mean cheap prompt
Agent loops, tool list stability, allowed tools and cache-aware prompt design.
Habr
7 prefix cache anti-patterns
Timestamp drift, floating tool order, round-robin routing and KV-cache lifetime.
Habr
Effective cost with cache
Why model choice needs cache-aware economics, not only list prices.
Habr
Agent Skills are more than a prompt folder
How agent skills relate to tools, RAG, MCP and agent architecture.
Talks
Talks and conference material feed the strategy, inference and economics tracks.
Own AI models or subscription API?
ROII Conference. A talk about choosing between MaaS, self-hosted and hybrid strategy for a product AI platform.
Engineering management
Ural Digital Weekend 2025. A software engineering management section. The link opens the relevant talk timestamp.
How to choose an AI model
Small talk podcast. A podcast on practical model choice, constraints, cost and product adoption.
Ways to work
Clear formats for talks, reviews and executive conversations.
Architecture review
Review gateway, routing, cache, evals, observability, cost and ownership before they harden into platform debt.
Executive workshop
Align MaaS vs self-hosted strategy, maturity, team responsibilities and the first platform roadmap.
Talk or podcast
A practical, non-hype conversation about production AI platform engineering.
Handbook collaboration
Turn public field notes, tools and templates into durable handbook artifacts.
About the author
Sergei Notevskii
I write Production AI Platform Handbook: a practical field guide for teams turning AI demos into production platforms.
Central sentence
The materials are public and sanitized: no internal details, but with production taste.
Read moreStart with the map
A model is replaceable. A platform is compounding.
The first release is intentionally small: map, maturity model, core platform layers and practical tools.