MLOps: Training & Experiments - Tracking and Model Registry
The Story of the Chefβs Recipe Book π
Imagine youβre a chef trying to create the perfect chocolate cake. Every time you bake, you change something β more sugar, less flour, different oven temperature. But hereβs the problem: after 50 tries, which recipe was actually the best? You canβt remember!
This is exactly what happens in machine learning. Data scientists train hundreds of models. Without a system to track everything, they get lost.
Experiment tracking and model registry are like your ultimate recipe book β they remember every single thing you tried, what worked, and where to find your best creations.
π§ͺ Experiment Tracking Basics
What Is It?
Think of experiment tracking like keeping a diary for your ML experiments.
Every time you train a model, you write down:
- What ingredients you used (data, features)
- What settings you chose (hyperparameters)
- How good the result was (metrics)
- Any notes about what happened
Without tracking: βI think the model from Tuesday was betterβ¦ or was it Thursday?β
With tracking: βRun #47 on Tuesday had 94% accuracy using learning rate 0.001.β
Simple Example
Experiment: Cat vs Dog Classifier
βββ Run 1: accuracy=78%, lr=0.01
βββ Run 2: accuracy=85%, lr=0.001 β Better!
βββ Run 3: accuracy=82%, lr=0.005
You instantly see Run 2 wins!
Why It Matters
- Never lose work β Every experiment is saved
- Easy comparison β See what changed between runs
- Reproducibility β Repeat any experiment exactly
- Collaboration β Team sees all experiments
ποΈ Experiment Tracking Platforms
Your Options
Just like there are different notebooks (Moleskine, Field Notes, digital apps), there are different tracking platforms:
| Platform | Best For | Example Use |
|---|---|---|
| MLflow | Open source, flexible | Self-hosted tracking |
| Weights & Biases | Beautiful dashboards | Visual experiment comparison |
| Neptune.ai | Team collaboration | Enterprise ML teams |
| Comet ML | Easy integration | Quick setup projects |
| TensorBoard | Deep learning | TensorFlow projects |
How They Work
graph TD A[Your Training Script] -->|Logs data| B[Tracking Platform] B --> C[Dashboard] B --> D[Storage] C -->|View| E[Compare Experiments] D -->|Retrieve| F[Best Model]
Real Example with MLflow
import mlflow
mlflow.start_run()
mlflow.log_param("learning_rate", 0.001)
mlflow.log_metric("accuracy", 0.94)
mlflow.end_run()
Thatβs it! Your experiment is now saved forever.
βοΈ Hyperparameter Logging
What Are Hyperparameters?
Back to our cake analogy:
- Data = Your ingredients (flour, eggs, chocolate)
- Hyperparameters = Your settings (oven temp, baking time, mixing speed)
Hyperparameters are the knobs you turn before training starts.
Common Hyperparameters
| Hyperparameter | What It Does | Example |
|---|---|---|
| Learning rate | How fast model learns | 0.001 |
| Batch size | Samples per update | 32 |
| Epochs | Training rounds | 100 |
| Hidden layers | Network depth | 3 |
| Dropout | Prevents overfitting | 0.2 |
Logging Example
# Log all hyperparameters at once
params = {
"learning_rate": 0.001,
"batch_size": 32,
"epochs": 100,
"optimizer": "adam"
}
mlflow.log_params(params)
Why Log Them?
Imagine your model performs amazingly. But you forgot what settings you used. Disaster!
Logging hyperparameters means you can always recreate success.
π Metric Logging
What Are Metrics?
Metrics are your report card β they tell you how well your model is doing.
Common Metrics
| Metric | Measures | Good Value |
|---|---|---|
| Accuracy | Correct predictions | Higher = Better |
| Loss | Error amount | Lower = Better |
| Precision | True positives | Higher = Better |
| Recall | Found all positives | Higher = Better |
| F1 Score | Balance of above | Higher = Better |
Logging Over Time
Hereβs the magic β you can log metrics at every step:
for epoch in range(100):
loss = train_one_epoch()
accuracy = evaluate()
# Log with step number
mlflow.log_metric("loss", loss, step=epoch)
mlflow.log_metric("accuracy", accuracy, step=epoch)
This creates a beautiful learning curve:
Accuracy Over Time
β
0.95 β ββββββββ
0.85 β ββββ
0.75 β ββββ
0.65 βββββ
βββββββββββββββββββββ
Epochs β
You can see your model getting smarter!
ποΈ Model Registry
The Problem
After 100 experiments, you found your best model. Now what?
- Where do you save it?
- How do you name it?
- What if you need the second-best model later?
- How does your team find it?
The Solution: Model Registry
A model registry is like a library for your trained models.
graph TD A[Trained Models] --> B[Model Registry] B --> C[Version 1.0] B --> D[Version 2.0] B --> E[Version 3.0] C --> F[Production] D --> G[Staging] E --> H[Development]
What It Stores
| Component | Description | Example |
|---|---|---|
| Model file | The actual model | model.pkl |
| Version | Which iteration | v1.2.0 |
| Stage | Where itβs deployed | Production |
| Description | What it does | βCat classifierβ |
| Tags | Labels for search | [βimageβ, βCNNβ] |
Real Example
# Register a model
mlflow.register_model(
model_uri="runs:/abc123/model",
name="cat-dog-classifier"
)
Now your model has a permanent home anyone can find!
π Model Registry Workflow
The Journey of a Model
Think of it like a new employee:
- Hired (Created) β Model is trained
- Training (Development) β Testing begins
- Probation (Staging) β Real-world tests
- Promoted (Production) β Serving users!
Typical Workflow
graph TD A[Train Model] --> B[Register in Registry] B --> C[Stage: None] C --> D{Tests Pass?} D -->|Yes| E[Stage: Staging] D -->|No| A E --> F{Production Ready?} F -->|Yes| G[Stage: Production] F -->|No| A G --> H[Serve Users]
Stage Transitions
# Move model to staging
client = mlflow.MlflowClient()
client.transition_model_version_stage(
name="cat-dog-classifier",
version=3,
stage="Staging"
)
# After tests pass, promote to production
client.transition_model_version_stage(
name="cat-dog-classifier",
version=3,
stage="Production"
)
Why This Matters
- No accidental deployments β Models must pass stages
- Easy rollback β Previous versions still exist
- Clear ownership β Everyone knows whatβs in production
π Model Metadata Management
What Is Metadata?
Metadata is data about your data (and models).
For a model, metadata includes:
- Who created it
- When it was created
- What data trained it
- What problem it solves
- How to use it
Types of Model Metadata
| Category | Examples |
|---|---|
| Identity | Name, version, ID |
| Timing | Created date, last modified |
| Performance | Accuracy, latency, size |
| Context | Training data, features used |
| Documentation | Description, usage notes |
| Tags | Custom labels for search |
Example Metadata
# Add rich metadata to your model
mlflow.log_param("model_type", "Random Forest")
mlflow.log_param("training_data", "dataset_v2.csv")
mlflow.log_param("author", "alice@company.com")
mlflow.log_param("problem", "fraud_detection")
mlflow.set_tag("team", "risk-ml")
mlflow.set_tag("compliance", "SOC2-approved")
Why It Matters
Six months later, someone asks: βWhat data trained the fraud model in production?β
With good metadata: 5-second answer. Without metadata: Hours of detective work.
𧬠Model Lineage
What Is Lineage?
Lineage answers: βWhere did this model come from?β
Itβs like a family tree for your model, showing:
- What data created it
- What code trained it
- What experiments led to it
- What other models it relates to
The Lineage Chain
graph TD A[Raw Data] --> B[Cleaned Data] B --> C[Feature Engineering] C --> D[Training Data] D --> E[Model Training] E --> F[Trained Model v1] F --> G[Fine-tuned Model v2] G --> H[Production Model]
Why Lineage Matters
Scenario 1: Bug in Production
Your fraud model starts making mistakes. Lineage shows it was trained on dataset_v2 which had a bug. You can trace the problem instantly.
Scenario 2: Compliance Audit Regulators ask: βProve your model wasnβt trained on biased data.β Lineage shows exactly what data was used.
Scenario 3: Reproducing Results A colleague wants to build on your work. Lineage shows every step from raw data to final model.
Tracking Lineage
# Log data lineage
mlflow.log_param("source_data", "s3://bucket/raw/")
mlflow.log_param("preprocessing", "v2.1")
mlflow.log_param("parent_run", "run_abc123")
# Log code version
mlflow.log_param("git_commit", "a1b2c3d")
mlflow.log_param("code_version", "1.5.0")
Complete Lineage Example
Model: fraud-detector-v3
βββ Data Lineage
β βββ Source: transactions_2024.csv
β βββ Cleaned: pipeline_v2
β βββ Features: feature_store_v1.2
βββ Code Lineage
β βββ Git commit: a1b2c3d
β βββ Branch: main
β βββ Training script: train.py
βββ Experiment Lineage
β βββ Parent run: exp_047
β βββ Based on: fraud-detector-v2
βββ Environment
βββ Python: 3.9
βββ Libraries: requirements.txt
π― Putting It All Together
The Complete Picture
graph TD A[Start Training] --> B[Log Hyperparameters] B --> C[Train Model] C --> D[Log Metrics] D --> E[Save Model] E --> F[Register in Registry] F --> G[Add Metadata] G --> H[Track Lineage] H --> I[Ready for Deployment!]
Quick Reference
| Concept | What It Does | Like⦠|
|---|---|---|
| Experiment Tracking | Records all training runs | A lab notebook |
| Hyperparameters | Settings before training | Oven temperature |
| Metrics | Performance measurements | Test scores |
| Model Registry | Stores trained models | A library |
| Metadata | Information about models | A bookβs index |
| Lineage | Shows model origins | A family tree |
π Youβre Ready!
You now understand how ML teams:
- Track every experiment
- Log settings and results
- Store models safely
- Manage model information
- Trace model origins
This is the foundation of professional MLOps. No more lost experiments. No more mystery models. Just organized, reproducible, traceable machine learning.
Remember: The best data scientists arenβt just good at training models β theyβre good at managing them too.
Happy experimenting! π§ͺ