Building a Modern Monitoring System: From Metrics to Alerts
min read
Building a Modern Monitoring System
In this post, I’ll walk through building a comprehensive monitoring system using modern tools and practices. We’ll cover everything from collecting metrics to sending alerts.
Metric Collection
First, let’s set up our Python collector using Prometheus client:
from prometheus_client import start_http_server, Counter, Gauge
import psutil
import time
# Initialize metrics
CPU_USAGE = Gauge('cpu_usage_percent', 'CPU usage in percent')
MEMORY_USAGE = Gauge('memory_usage_percent', 'Memory usage in percent')
REQUEST_COUNT = Counter('request_count_total', 'Total request count')
def collect_metrics():
while True:
# Update CPU and memory metrics
CPU_USAGE.set(psutil.cpu_percent())
MEMORY_USAGE.set(psutil.virtual_memory().percent)
time.sleep(5)
if __name__ == '__main__':
start_http_server(8000)
collect_metrics()
Alert Configuration
Here’s our Alertmanager configuration in YAML:
global:
resolve_timeout: 5m
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'slack-notifications'
receivers:
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
title: "\n"
text: "🔥 Alert: \n"