Building a Modern Monitoring System: From Metrics to Alerts
      
       min read
    
    Building a Modern Monitoring System
In this post, I’ll walk through building a comprehensive monitoring system using modern tools and practices. We’ll cover everything from collecting metrics to sending alerts.
Metric Collection
First, let’s set up our Python collector using Prometheus client:
from prometheus_client import start_http_server, Counter, Gauge
import psutil
import time
# Initialize metrics
CPU_USAGE = Gauge('cpu_usage_percent', 'CPU usage in percent')
MEMORY_USAGE = Gauge('memory_usage_percent', 'Memory usage in percent')
REQUEST_COUNT = Counter('request_count_total', 'Total request count')
def collect_metrics():
    while True:
        # Update CPU and memory metrics
        CPU_USAGE.set(psutil.cpu_percent())
        MEMORY_USAGE.set(psutil.virtual_memory().percent)
        time.sleep(5)
if __name__ == '__main__':
    start_http_server(8000)
    collect_metrics()Alert Configuration
Here’s our Alertmanager configuration in YAML:
global:
  resolve_timeout: 5m
  slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
  group_by: ['alertname', 'cluster', 'service']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h
  receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
  slack_configs:
  - channel: '#alerts'
    title: "\n"
    text: "🔥 Alert: \n"