What is experiment tracking in MLOps?

Experiment tracking is like a diary for ML experiments. It records data used, hyperparameters chosen, and metrics achieved for every training run.

What is a model registry?

A model registry is a library for trained models. It stores model files with versions, stages (dev/staging/production), and metadata for easy retrieval.

Why does model lineage matter?

Model lineage shows where a model came from - what data trained it, what code was used. It helps debug issues and prove compliance in audits.

Tracking and Model Registry | MLOps Guide

MLOps: Training & Experiments - Tracking and Model Registry

The Story of the Chef’s Recipe Book 📖

Imagine you’re a chef trying to create the perfect chocolate cake. Every time you bake, you change something — more sugar, less flour, different oven temperature. But here’s the problem: after 50 tries, which recipe was actually the best? You can’t remember!

This is exactly what happens in machine learning. Data scientists train hundreds of models. Without a system to track everything, they get lost.

Experiment tracking and model registry are like your ultimate recipe book — they remember every single thing you tried, what worked, and where to find your best creations.

🧪 Experiment Tracking Basics

What Is It?

Think of experiment tracking like keeping a diary for your ML experiments.

Every time you train a model, you write down:

What ingredients you used (data, features)
What settings you chose (hyperparameters)
How good the result was (metrics)
Any notes about what happened

Without tracking: “I think the model from Tuesday was better… or was it Thursday?”

With tracking: “Run #47 on Tuesday had 94% accuracy using learning rate 0.001.”

Simple Example

Experiment: Cat vs Dog Classifier
├── Run 1: accuracy=78%, lr=0.01
├── Run 2: accuracy=85%, lr=0.001  ← Better!
└── Run 3: accuracy=82%, lr=0.005

You instantly see Run 2 wins!

Why It Matters

Never lose work — Every experiment is saved
Easy comparison — See what changed between runs
Reproducibility — Repeat any experiment exactly
Collaboration — Team sees all experiments

🏗️ Experiment Tracking Platforms

Your Options

Just like there are different notebooks (Moleskine, Field Notes, digital apps), there are different tracking platforms:

Platform	Best For	Example Use
MLflow	Open source, flexible	Self-hosted tracking
Weights & Biases	Beautiful dashboards	Visual experiment comparison
Neptune.ai	Team collaboration	Enterprise ML teams
Comet ML	Easy integration	Quick setup projects
TensorBoard	Deep learning	TensorFlow projects

How They Work

graph TD
    A["Your Training Script"] -->|Logs data| B["Tracking Platform"]
    B --> C["Dashboard"]
    B --> D["Storage"]
    C -->|View| E["Compare Experiments"]
    D -->|Retrieve| F["Best Model"]

Real Example with MLflow

import mlflow

mlflow.start_run()
mlflow.log_param("learning_rate", 0.001)
mlflow.log_metric("accuracy", 0.94)
mlflow.end_run()

That’s it! Your experiment is now saved forever.

⚙️ Hyperparameter Logging

What Are Hyperparameters?

Back to our cake analogy:

Data = Your ingredients (flour, eggs, chocolate)
Hyperparameters = Your settings (oven temp, baking time, mixing speed)

Hyperparameters are the knobs you turn before training starts.

Common Hyperparameters

Hyperparameter	What It Does	Example
Learning rate	How fast model learns	0.001
Batch size	Samples per update	32
Epochs	Training rounds	100
Hidden layers	Network depth	3
Dropout	Prevents overfitting	0.2

Logging Example

# Log all hyperparameters at once
params = {
    "learning_rate": 0.001,
    "batch_size": 32,
    "epochs": 100,
    "optimizer": "adam"
}
mlflow.log_params(params)

Why Log Them?

Imagine your model performs amazingly. But you forgot what settings you used. Disaster!

Logging hyperparameters means you can always recreate success.

📊 Metric Logging

What Are Metrics?

Metrics are your report card — they tell you how well your model is doing.

Common Metrics

Metric	Measures	Good Value
Accuracy	Correct predictions	Higher = Better
Loss	Error amount	Lower = Better
Precision	True positives	Higher = Better
Recall	Found all positives	Higher = Better
F1 Score	Balance of above	Higher = Better

Logging Over Time

Here’s the magic — you can log metrics at every step:

for epoch in range(100):
    loss = train_one_epoch()
    accuracy = evaluate()

    # Log with step number
    mlflow.log_metric("loss", loss, step=epoch)
    mlflow.log_metric("accuracy", accuracy, step=epoch)

This creates a beautiful learning curve:

Accuracy Over Time
     │
0.95 │            ████████
0.85 │        ████
0.75 │    ████
0.65 │████
     └────────────────────
          Epochs →

You can see your model getting smarter!

🏛️ Model Registry

The Problem

After 100 experiments, you found your best model. Now what?

Where do you save it?
How do you name it?
What if you need the second-best model later?
How does your team find it?

The Solution: Model Registry

A model registry is like a library for your trained models.

graph TD
    A["Trained Models"] --> B["Model Registry"]
    B --> C["Version 1.0"]
    B --> D["Version 2.0"]
    B --> E["Version 3.0"]
    C --> F["Production"]
    D --> G["Staging"]
    E --> H["Development"]

What It Stores

Component	Description	Example
Model file	The actual model	`model.pkl`
Version	Which iteration	v1.2.0
Stage	Where it’s deployed	Production
Description	What it does	“Cat classifier”
Tags	Labels for search	[“image”, “CNN”]

Real Example

# Register a model
mlflow.register_model(
    model_uri="runs:/abc123/model",
    name="cat-dog-classifier"
)

Now your model has a permanent home anyone can find!

🔄 Model Registry Workflow

The Journey of a Model

Think of it like a new employee:

Hired (Created) — Model is trained
Training (Development) — Testing begins
Probation (Staging) — Real-world tests
Promoted (Production) — Serving users!

Typical Workflow

graph TD
    A["Train Model"] --> B["Register in Registry"]
    B --> C["Stage: None"]
    C --> D{Tests Pass?}
    D -->|Yes| E["Stage: Staging"]
    D -->|No| A
    E --> F{Production Ready?}
    F -->|Yes| G["Stage: Production"]
    F -->|No| A
    G --> H["Serve Users"]

Stage Transitions

# Move model to staging
client = mlflow.MlflowClient()
client.transition_model_version_stage(
    name="cat-dog-classifier",
    version=3,
    stage="Staging"
)

# After tests pass, promote to production
client.transition_model_version_stage(
    name="cat-dog-classifier",
    version=3,
    stage="Production"
)

Why This Matters

No accidental deployments — Models must pass stages
Easy rollback — Previous versions still exist
Clear ownership — Everyone knows what’s in production

📋 Model Metadata Management

What Is Metadata?

Metadata is data about your data (and models).

For a model, metadata includes:

Who created it
When it was created
What data trained it
What problem it solves
How to use it

Types of Model Metadata

Category	Examples
Identity	Name, version, ID
Timing	Created date, last modified
Performance	Accuracy, latency, size
Context	Training data, features used
Documentation	Description, usage notes
Tags	Custom labels for search

Example Metadata

# Add rich metadata to your model
mlflow.log_param("model_type", "Random Forest")
mlflow.log_param("training_data", "dataset_v2.csv")
mlflow.log_param("author", "alice@company.com")
mlflow.log_param("problem", "fraud_detection")
mlflow.set_tag("team", "risk-ml")
mlflow.set_tag("compliance", "SOC2-approved")

Why It Matters

Six months later, someone asks: “What data trained the fraud model in production?”

With good metadata: 5-second answer. Without metadata: Hours of detective work.

🧬 Model Lineage

What Is Lineage?

Lineage answers: “Where did this model come from?”

It’s like a family tree for your model, showing:

What data created it
What code trained it
What experiments led to it
What other models it relates to

The Lineage Chain

graph TD
    A["Raw Data"] --> B["Cleaned Data"]
    B --> C["Feature Engineering"]
    C --> D["Training Data"]
    D --> E["Model Training"]
    E --> F["Trained Model v1"]
    F --> G["Fine-tuned Model v2"]
    G --> H["Production Model"]

Why Lineage Matters

Scenario 1: Bug in Production Your fraud model starts making mistakes. Lineage shows it was trained on dataset_v2 which had a bug. You can trace the problem instantly.

Scenario 2: Compliance Audit Regulators ask: “Prove your model wasn’t trained on biased data.” Lineage shows exactly what data was used.

Scenario 3: Reproducing Results A colleague wants to build on your work. Lineage shows every step from raw data to final model.

Tracking Lineage

# Log data lineage
mlflow.log_param("source_data", "s3://bucket/raw/")
mlflow.log_param("preprocessing", "v2.1")
mlflow.log_param("parent_run", "run_abc123")

# Log code version
mlflow.log_param("git_commit", "a1b2c3d")
mlflow.log_param("code_version", "1.5.0")

Complete Lineage Example

Model: fraud-detector-v3
├── Data Lineage
│   ├── Source: transactions_2024.csv
│   ├── Cleaned: pipeline_v2
│   └── Features: feature_store_v1.2
├── Code Lineage
│   ├── Git commit: a1b2c3d
│   ├── Branch: main
│   └── Training script: train.py
├── Experiment Lineage
│   ├── Parent run: exp_047
│   └── Based on: fraud-detector-v2
└── Environment
    ├── Python: 3.9
    └── Libraries: requirements.txt

🎯 Putting It All Together

The Complete Picture

graph TD
    A["Start Training"] --> B["Log Hyperparameters"]
    B --> C["Train Model"]
    C --> D["Log Metrics"]
    D --> E["Save Model"]
    E --> F["Register in Registry"]
    F --> G["Add Metadata"]
    G --> H["Track Lineage"]
    H --> I["Ready for Deployment!"]

Quick Reference

Concept	What It Does	Like…
Experiment Tracking	Records all training runs	A lab notebook
Hyperparameters	Settings before training	Oven temperature
Metrics	Performance measurements	Test scores
Model Registry	Stores trained models	A library
Metadata	Information about models	A book’s index
Lineage	Shows model origins	A family tree

🚀 You’re Ready!

You now understand how ML teams:

Track every experiment
Log settings and results
Store models safely
Manage model information
Trace model origins

This is the foundation of professional MLOps. No more lost experiments. No more mystery models. Just organized, reproducible, traceable machine learning.

Remember: The best data scientists aren’t just good at training models — they’re good at managing them too.

Happy experimenting! 🧪

Tracking and Model Registry

Unable to load concept

Coming Soon...

MLOps: Training & Experiments - Tracking and Model Registry

The Story of the Chef’s Recipe Book 📖

🧪 Experiment Tracking Basics

What Is It?

Simple Example

Why It Matters

🏗️ Experiment Tracking Platforms

Your Options

How They Work

Real Example with MLflow

⚙️ Hyperparameter Logging

What Are Hyperparameters?

Common Hyperparameters

Logging Example

Why Log Them?

📊 Metric Logging

What Are Metrics?

Common Metrics

Logging Over Time

🏛️ Model Registry

The Problem

The Solution: Model Registry

What It Stores

Real Example

🔄 Model Registry Workflow

The Journey of a Model

Typical Workflow

Stage Transitions

Why This Matters

📋 Model Metadata Management

What Is Metadata?

Types of Model Metadata

Example Metadata

Why It Matters

🧬 Model Lineage

What Is Lineage?

The Lineage Chain

Why Lineage Matters

Tracking Lineage

Complete Lineage Example

🎯 Putting It All Together

The Complete Picture

Quick Reference

🚀 You’re Ready!

Story - Premium Content

Stay Tuned!

Story - Premium Content

Interactive - Premium Content

Interactive - Premium Content

Stay Tuned!

Cheatsheet - Premium Content

Cheatsheet - Premium Content

Stay Tuned!

Quiz - Premium Content

Quiz - Premium Content

Stay Tuned!

Flashcard - Premium Content

Flashcard - Premium Content

Stay Tuned!

Sign in Required

Report an Issue