AI Platform Lead

Sergei Notevskii

I build AI platforms that work in production: LLM, STT, embeddings, agents, inference, evals, observability, cost and ownership. I write Production AI Platform Handbook: a practical map of what starts after the demo.

Open handbook Platform map Contact

From API key to platform

01API key / demo

02AI Gateway

03Routing / Inference / Cache

04Evals / Observability / Cost

05Guardrails / Ownership

Practice

Field notes from production AI platform work: sanitized, practical and focused on engineering decisions.

AI platform

LLM · STT · embeddings · agents

Self-hosted inference

vLLM · GPU · routing · cache

Quality

Evals · regression · feedback loops

Economics

Scenario cost · prefix cache · tokens

Public artifacts

Habr · talks · open-source

After the demo

The demo works. Platform questions start next.

The hard part starts after the first successful model call: cost, quality, latency, ownership and operations.

Latency spikes.

Token cost grows.

Prompts break.

Agents get stuck in loops.

Evals are missing.

Nobody owns quality.

At that point, AI stops being a feature and becomes a platform.

Flagship project

Production AI Platform Handbook

A practical handbook for teams moving from API key and demo to production AI platform. Inside: a 12-layer map, chapters, checklists, tools and templates.

Open handbook 12-layer map

12-layer map

From product scenario to owner, cost and operations.

Chapters

Gateway, inference, economics, cache, evals, observability and ownership.

Tools

Prefix Cache Auditor, LLM Cost Calculator and quality checklist.

Templates

Scenario RFC, self-hosted migration, cost review and incidents.

Public work

Handbook pages, tools, articles and talks that make the platform practice reusable.

Production AI Platform Handbook

A platform responsibility map for teams moving from API key and demo to inference, routing, evals, cost and ownership.

Prefix Cache Auditor

A client-side diagnostic tool for unstable prefixes, dynamic fields, tool schema drift and cache-aware recommendations.

audit-prompt-caching

An open-source diagnostic package for prompt and prefix cache audits: stable layout, volatile fields and cache-aware recommendations.

Writing

Habr articles and Telegram notes

Long-form Habr articles and short Telegram notes.

Talks

Talks and podcasts

Videos and podcasts about model choice, platform strategy and engineering work.

Where I am useful

Architecture review, platform strategy, talks and practical collaboration.

Architecture review

AI Gateway, routing, cache, inference, evals, observability, cost and ownership.

Strategy session

MaaS vs self-hosted, AI platform maturity, ownership boundaries and first roadmap.

Talk or podcast

A practical conversation about production AI without hype: inference, evals, prefix cache, economics and guardrails.

Collaboration

Handbook, open-source tools, templates and joint public materials.

About the author

Sergei Notevskii

I am Sergei Notevskii, AI Platform Lead. I work across platform architecture, inference, quality systems, observability and AI economics. This site is the public layer of that practice: notes, tools, templates and handbook material without internal details.

A model is replaceable. A platform compounds.

Start with the map

A model is replaceable. A platform is compounding.

The first release is intentionally small: map, maturity model, core platform layers and practical tools.

Open the map