Monitoring and Health

Back

Loading concept...

Docker Monitoring & Health: Keeping Your Containers Alive and Happy 🏥

The Story of the Container Hospital

Imagine you’re running a hospital for tiny robot workers (containers). Each robot does a specific job—one serves food, another cleans, another answers phones. But how do you know if they’re healthy? What if one falls sick? How do you find out what went wrong?

That’s exactly what monitoring and health is about in Docker! We’re going to learn how to:

  • Listen to what our containers are saying (logging)
  • Watch them from a control room (monitoring)
  • Give them regular health checkups (healthchecks)

1. Logging Best Practices 📝

What is Logging?

Think of logs like a diary your container writes. Every time something happens—a visitor arrives, a task completes, an error occurs—the container writes it down.

The Golden Rules of Container Logging

Rule 1: Write to STDOUT and STDERR

Your container should talk to the screen, not write to hidden files!

# ✅ Good: App prints to console
CMD ["node", "app.js"]

# The app inside does:
# console.log("User logged in")
# console.error("Database connection failed")

Rule 2: Use JSON Format

Write logs like a neat list, not messy paragraphs:

{
  "time": "2024-01-15T10:30:00",
  "level": "info",
  "message": "User logged in",
  "userId": 123
}

Rule 3: Include Context

Always answer: Who? What? When? Where?

{
  "timestamp": "2024-01-15T10:30:00Z",
  "service": "auth-service",
  "container_id": "abc123",
  "message": "Login successful",
  "user": "alice"
}

Quick Commands to See Logs

# See all logs from a container
docker logs my-container

# Follow logs live (like watching TV)
docker logs -f my-container

# See last 50 lines
docker logs --tail 50 my-container

# See logs with timestamps
docker logs -t my-container

2. Centralized Logging Setup 🎯

The Problem

Imagine having 100 robot workers, each writing their own diary in different rooms. Finding what went wrong becomes a nightmare!

The Solution: One Central Library

graph TD A["Container 1"] -->|sends logs| D["Central Log Server"] B["Container 2"] -->|sends logs| D C["Container 3"] -->|sends logs| D D -->|search & analyze| E["You!"]

Setting Up with Docker Logging Drivers

Tell Docker to send logs somewhere central:

# Send logs to a syslog server
docker run -d \
  --log-driver=syslog \
  --log-opt syslog-address=tcp://logs.example.com:514 \
  my-app

Popular Centralized Solutions

Tool Think of it as…
ELK Stack A giant searchable library
Fluentd A mail carrier collecting logs
Loki A simple notebook system

Docker Compose Example

version: '3.8'
services:
  web:
    image: nginx
    logging:
      driver: "fluentd"
      options:
        fluentd-address: "localhost:24224"
        tag: "web.nginx"

3. Container Monitoring Overview 📊

What is Monitoring?

Logging tells you what happened. Monitoring tells you how things are right now.

It’s like the difference between:

  • Logging: Reading yesterday’s diary
  • Monitoring: Looking at a live dashboard

What Do We Monitor?

graph TD A["Container Metrics"] --> B["CPU Usage"] A --> C["Memory Usage"] A --> D["Network Traffic"] A --> E["Disk I/O"] A --> F["Container Count"]

Quick Monitoring with Docker

# See live stats (like a health monitor)
docker stats

# Output shows:
# CONTAINER   CPU %   MEM USAGE   NET I/O   BLOCK I/O
# web-app     2.5%    150MB       10KB/5KB  0B/0B
# database    15.0%   512MB       1MB/500KB 10MB/5MB

The Monitoring Stack

Component Job
Prometheus Collects and stores metrics
cAdvisor Watches containers specifically
Grafana Shows pretty dashboards
Alertmanager Sends you warnings

4. Prometheus Integration 🔥

What is Prometheus?

Prometheus is like a reporter that visits your containers regularly, asks “How are you?”, and writes down the answers.

How Prometheus Works

graph TD A["Prometheus"] -->|scrapes every 15s| B["Container 1"] A -->|scrapes every 15s| C["Container 2"] A -->|scrapes every 15s| D["cAdvisor"] A -->|stores| E["Time Series Database"] E -->|displays| F["Grafana Dashboard"]

Setting Up Prometheus

prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'docker'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'my-app'
    static_configs:
      - targets: ['my-app:9090']

Running Prometheus in Docker

# docker-compose.yml
version: '3.8'
services:
  prometheus:
    image: prom/prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

Basic Prometheus Queries

# How much CPU is my container using?
container_cpu_usage_seconds_total{name="web"}

# How much memory?
container_memory_usage_bytes{name="web"}

# How many containers are running?
count(container_last_seen)

5. cAdvisor (Container Advisor) 🔍

What is cAdvisor?

cAdvisor is like a fitness tracker for containers. It watches each container and reports:

  • How hard is it working? (CPU)
  • How much memory is it using?
  • Is it talking to the network?

Running cAdvisor

docker run -d \
  --name=cadvisor \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --publish=8080:8080 \
  gcr.io/cadvisor/cadvisor

What cAdvisor Shows

Visit http://localhost:8080 to see:

Metric What It Means
CPU How busy the container is
Memory RAM being used
Network Data in and out
Filesystem Disk usage

cAdvisor + Prometheus

cAdvisor automatically exposes metrics that Prometheus can collect:

scrape_configs:
  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

6. Docker Healthchecks 💓

What is a Healthcheck?

A healthcheck is like a doctor’s visit for your container. Docker asks: “Are you okay?” and the container answers yes or no.

Adding Healthcheck to Dockerfile

FROM nginx:alpine

# Check every 30 seconds if nginx is responding
HEALTHCHECK --interval=30s \
            --timeout=10s \
            --start-period=5s \
            --retries=3 \
  CMD curl -f http://localhost/ || exit 1

Healthcheck in Docker Compose

version: '3.8'
services:
  web:
    image: nginx
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 5s

Types of Health Tests

# HTTP check (web servers)
CMD curl -f http://localhost:8080/health

# TCP check (databases)
CMD nc -z localhost 5432

# Custom script
CMD /app/health-check.sh

# Command check
CMD ["pg_isready", "-U", "postgres"]

7. Health Status and Intervals ⏱️

The Three Health States

graph LR A["starting"] -->|passes check| B["healthy"] A -->|fails check| C["unhealthy"] B -->|fails check| C C -->|passes check| B
Status What It Means
starting Container just started, waiting for first check
healthy All good! Container is working
unhealthy Something’s wrong!

Understanding Intervals

HEALTHCHECK --interval=30s \
            --timeout=10s \
            --start-period=5s \
            --retries=3 \
  CMD curl -f http://localhost/
Setting Meaning Example
interval How often to check Every 30 seconds
timeout How long to wait for answer Give up after 10 seconds
start-period Grace period after start Wait 5 seconds before first check
retries Failures before “unhealthy” 3 fails = unhealthy

Checking Health Status

# See health status
docker ps
# Shows: STATUS column with (healthy) or (unhealthy)

# Detailed health info
docker inspect --format='{{.State.Health.Status}}' my-container

# See health check history
docker inspect --format='{{json .State.Health}}' my-container | jq

Using Health in Compose Dependencies

version: '3.8'
services:
  db:
    image: postgres
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

  web:
    image: my-web-app
    depends_on:
      db:
        condition: service_healthy

This means: “Don’t start web until db is healthy!”


Quick Summary 🎯

Topic Think of it as…
Logging Container’s diary
Centralized Logging One library for all diaries
Monitoring Live health dashboard
Prometheus The data collector
cAdvisor Container fitness tracker
Healthcheck Doctor’s visit
Health Status Starting → Healthy → Unhealthy

Your Container Health Journey 🚀

You now know how to:

  1. ✅ Make containers write good logs
  2. ✅ Collect all logs in one place
  3. ✅ Watch containers in real-time
  4. ✅ Set up Prometheus for metrics
  5. ✅ Use cAdvisor for container stats
  6. ✅ Add healthchecks to containers
  7. ✅ Understand health status and timing

Your containers are no longer mysterious black boxes. You can see inside them, hear what they’re saying, and know when they need help!

Remember: A well-monitored container is a happy container! 🐳💚

Loading story...

Story - Premium Content

Please sign in to view this story and start learning.

Upgrade to Premium to unlock full access to all stories.

Stay Tuned!

Story is coming soon.

Story Preview

Story - Premium Content

Please sign in to view this concept and start learning.

Upgrade to Premium to unlock full access to all content.