MLOps for an operator: models in prod, not in Jupyter
Most operators have dozens of models in production with no discipline to refresh, monitor or roll back. MLOps is the operating loop for models.
Discuss Your ChallengeWhy an operator needs a dedicated MLOps loop
A typical operator has 15-30 ML models in production: churn, propensity, fraud, credit risk, NBA, recommendation, anomaly detection. Most were trained 1-3 years ago by the data science team, then “deployed” via an unclear procedure and now run in production with slowly degrading accuracy.
MLOps is the operating loop that turns models from an academic artefact into a managed production component with lifecycle, governance and observability.
Without MLOps:
- Models are not retrained (nobody knows how).
- Models are not monitored (nobody knows accuracy fell).
- When bias is detected it cannot be quickly rolled back.
- Each new model is again a hero project.
Structural elements
Feature store. Centralised feature storage with versioning. The same feature is available for training (offline) and inference (online), with a consistency guarantee.
Model registry. Model store with versions, metadata (training data, hyperparameters, metrics), status (development, staging, production, archived).
Training pipeline. Reproducible — data → preprocessing → training → evaluation → registry. Triggered by schedule or by event (data drift detected).
Deployment. Standardised path from registry to production: shadow mode → canary → full traffic. With automatic rollback on performance regression.
Monitoring. Live metrics: prediction distribution, feature distribution drift, model accuracy (when ground truth is available), business KPI (uplift, conversion).
Governance. Approval workflow for production deployment, audit trail (who deployed what), explainability layer for regulator-facing models.
Where it usually breaks
Train-serve skew. Features used in training are computed differently in production (different system, different logic). The model does not behave as it did in testing.
Models live in Jupyter. Data scientist trained a model in a notebook, handed colleagues a pickle file, nobody remembers how to reproduce. A year later the model is stale, no way to update.
Deployment goes through an IT ticket. Each model update — weeks of approvals. The team stops updating, models age.
No monitoring. The model has been in prod for a year, nobody watches its accuracy. When noticed — accuracy dropped 30%, and nobody knows since when.
No registry. The “production model” is a file on some server that one person knows the location of. Staff turnover — the model is lost.
Bias is not checked. The model declines credit offers to people from certain regions. A year later — regulator complaints.
Operating model
Owner — Head of ML / Head of Data Science with an infrastructure mandate. Not a separate data science team, not a separate DevOps — a function at the intersection.
Teams:
- ML platform (feature store, registry, deployment infra)
- Data science (models, experiments)
- ML engineering (production-ready code, integration)
- ML governance (approval, audit, fairness)
Routine — weekly review of production models: drift, accuracy, business KPI.
What is measured
Time from idea to production — how long the path is from hypothesis to a live model. Target — weeks, not months.
Number of models in monitored production. Without monitoring — does not count.
Drift incidents per quarter — how many times significant drift was detected, how fast the response was.
Model rollback time — how long it takes to roll back a degraded model.
Business uplift per model — the business effect this specific model brings.
How SamaraliSoft engages
MLOps Blueprint — 8-10 weeks. Inventory of existing models, current pain, design of target architecture (feature store, registry, deployment), governance, stack choice (open-source vs managed: Vertex AI, SageMaker, Databricks). Pilot — take one model and run it through the full lifecycle.
Related
- /en/architecture/telecom-cdp-architecture/ — CDP as feature source
- /en/architecture/telecom-realtime-decisioning-architecture/ — decisioning as model consumer
- /en/insights/telecom-ai-governance/ — AI governance
- /en/architecture/telecom-data-platform/ — data platform
What else is worth exploring
Topics from the same area we usually explore together
CRM
Not an off-the-shelf CRM, but a properly built customer management contour — from first contact to loyalty.
→SolutionBI
Analytics is not pretty charts on the wall. It's the answer to 'why?' before the problem becomes a loss.
→SolutionContact Center
The contact center is not a phone station — it's the point where a client decides: stay with you or leave. The question is how it's built…
→SolutionIntegrations
Integrations are invisible but critical. When they work — systems talk. When they don't — data is lost and people copy from window to…
→I do not just write about this. I can come in, examine your situation and design a solution for your specific landscape.
Discuss applying this →Ready to discuss your challenge?
Tell me what's not working or what needs to be built. First conversation — no obligations.
Usually respond within a few hours