Sergei Notevskii
Strategy
Русская версия

MaaS vs Self-hosted

A strategy decision for managed APIs, self-hosted inference and hybrid AI platforms.

Applied
v0.1
Updated May 23, 2026
AI Platform Leads
CTOs
Staff Engineers
maas
self-hosted
provider-strategy
Saved only in this browser.

Problem

MaaS vs self-hosted is usually framed as religion: closed API or own GPUs. In production it is a strategy decision with different failure modes.

Symptoms

  • The team wants cheaper tokens but has no capacity model.
  • Leadership wants data control but underestimates reliability work.
  • Product teams need model choice but lack a gateway contract.

Mental model

Use managed APIs for speed, breadth and research loops. Use self-hosted when control, data boundary, economics, latency or product strategy justify owning inference operations.

The main distinction: MaaS vs self-hosted is not a provider choice. It is an operating-model choice for a specific AI scenario.

Safe Scenario Migration

Do not start a migration by buying GPUs. First, the product team finds a model where the scenario works at all: through MaaS, OpenRouter or another external provider. Then the platform team starts a rough self-hosted candidate, checks quality reproducibility, builds latency and cost profiles, runs evals and only then starts canary or a dedicated pool.

Migration playbook

01

Discovery

MaaS, OpenRouter or another external provider.

02

Draft self-hosted

Rough launch without production promises.

03

Baseline

Quality, latency and cost of the current route.

04

Evals

Baseline comparison and stop criteria.

05

Canary

Traffic share, fallback and incident owner.

A good migration is not replacing model_name. It proves that the scenario keeps quality, SLO and economics inside the new boundary.

Architecture

OptionUse whenTrade-off
MaaSFast iteration, broad model access, uncertain demand.Less control over routing, cache internals and provider economics.
Self-hostedStable demand, data boundary, latency/cost control or custom serving needs.You own GPU capacity, uptime, upgrades and incidents.
HybridProduction needs control but research still needs model breadth.Requires gateway, routing policy and clear model lifecycle.

Scenario-level Decision

The mistake is deciding "we are self-hosted now" or "we are MaaS now." The right level is a concrete AI scenario: data, SLA, volume, model quality, engineering cost and processing mode.

ConditionUsually better
Spiky demandMaaS
Frontier model requiredMaaS
Data can be de-identifiedMaaS or hybrid
Data cannot leave the boundaryself-hosted or on-premise
Stable high-volume workloadself-hosted
Task is not urgentbatch or deferred processing
Model customization is requiredself-hosted
No evals and MLOps yetbe careful: self-hosted is early

Metrics

Compare cost per accepted outcome, latency distribution, availability, quality gate pass rate, cache hit rate, utilization, engineering load and incident risk.

Trade-offs

Self-hosted can reduce marginal cost but increase fixed cost. MaaS can accelerate evaluation but hide cache and routing behavior. Hybrid can be best but only if the gateway prevents product teams from seeing the complexity.

Moving between MaaS, self-hosted and hybrid changes more than token price. Cache semantics change: TTL, write/read pricing, cache locality, routing affinity, eviction and available observability fields. Count the migration with a new expected hit rate, not only with a new model price.

Anti-patterns

  • Moving to GPUs because a spreadsheet says tokens are cheaper.
  • Counting only production GPUs and forgetting that staging, test and debug instances cost money while serving no user traffic.
  • Keeping every product team on direct provider integrations.
  • Running self-hosted models without model release notes, evals or rollback.

Checklist

  • Demand is predictable enough for capacity planning.
  • Quality is measured by scenario, not only model benchmark.
  • Gateway can route between providers and self-hosted aliases.
  • Fallback behavior is defined before migration.
  • Self-hosted cost includes production, stage, test/debug, canary/rollout capacity and peak reserve.
  • Cost comparison includes engineering, observability, evals and incidents.

Example

A customer support summarization flow might stay on MaaS during research, move heavy stable traffic to self-hosted inference, and keep MaaS fallback for spikes or quality regressions. That is not a permanent migration; it is a routing policy.

Decision template

Document scenario, data boundary, demand shape, model candidates, cost model, expected cache hit rate, cache TTL, affinity strategy, prefix-aware routing, cache observability fields, capacity assumptions, quality gate, fallback and owner.

On this page