Architecture

MLOps for an operator: models in prod, not in Jupyter

Most operators have dozens of models in production with no discipline to refresh, monitor or roll back. MLOps is the operating loop for models.

Discuss Your Challenge

Why an operator needs a dedicated MLOps loop

A typical operator has 15-30 ML models in production: churn, propensity, fraud, credit risk, NBA, recommendation, anomaly detection. Most were trained 1-3 years ago by the data science team, then “deployed” via an unclear procedure and now run in production with slowly degrading accuracy.

MLOps is the operating loop that turns models from an academic artefact into a managed production component with lifecycle, governance and observability.

Without MLOps:

  • Models are not retrained (nobody knows how).
  • Models are not monitored (nobody knows accuracy fell).
  • When bias is detected it cannot be quickly rolled back.
  • Each new model is again a hero project.

Structural elements

Feature store. Centralised feature storage with versioning. The same feature is available for training (offline) and inference (online), with a consistency guarantee.

Model registry. Model store with versions, metadata (training data, hyperparameters, metrics), status (development, staging, production, archived).

Training pipeline. Reproducible — data → preprocessing → training → evaluation → registry. Triggered by schedule or by event (data drift detected).

Deployment. Standardised path from registry to production: shadow mode → canary → full traffic. With automatic rollback on performance regression.

Monitoring. Live metrics: prediction distribution, feature distribution drift, model accuracy (when ground truth is available), business KPI (uplift, conversion).

Governance. Approval workflow for production deployment, audit trail (who deployed what), explainability layer for regulator-facing models.

Where it usually breaks

Train-serve skew. Features used in training are computed differently in production (different system, different logic). The model does not behave as it did in testing.

Models live in Jupyter. Data scientist trained a model in a notebook, handed colleagues a pickle file, nobody remembers how to reproduce. A year later the model is stale, no way to update.

Deployment goes through an IT ticket. Each model update — weeks of approvals. The team stops updating, models age.

No monitoring. The model has been in prod for a year, nobody watches its accuracy. When noticed — accuracy dropped 30%, and nobody knows since when.

No registry. The “production model” is a file on some server that one person knows the location of. Staff turnover — the model is lost.

Bias is not checked. The model declines credit offers to people from certain regions. A year later — regulator complaints.

Operating model

Owner — Head of ML / Head of Data Science with an infrastructure mandate. Not a separate data science team, not a separate DevOps — a function at the intersection.

Teams:

  • ML platform (feature store, registry, deployment infra)
  • Data science (models, experiments)
  • ML engineering (production-ready code, integration)
  • ML governance (approval, audit, fairness)

Routine — weekly review of production models: drift, accuracy, business KPI.

What is measured

Time from idea to production — how long the path is from hypothesis to a live model. Target — weeks, not months.

Number of models in monitored production. Without monitoring — does not count.

Drift incidents per quarter — how many times significant drift was detected, how fast the response was.

Model rollback time — how long it takes to roll back a degraded model.

Business uplift per model — the business effect this specific model brings.

How SamaraliSoft engages

MLOps Blueprint — 8-10 weeks. Inventory of existing models, current pain, design of target architecture (feature store, registry, deployment), governance, stack choice (open-source vs managed: Vertex AI, SageMaker, Databricks). Pilot — take one model and run it through the full lifecycle.

← Back

Ready to discuss your challenge?

Tell me what's not working or what needs to be built. First conversation — no obligations.

Usually respond within a few hours

Discuss a challenge
Choose a convenient way to connect
Telegram
Fast reply
Fast
WhatsApp
Voice and documents
📞
Call
+998 99 838-11-88