Prometheus Monitoring

Creates comprehensive Prometheus monitoring configurations including scrape job definitions, relabeling rules, recording rules for precomputed metrics, alerting rules with severity levels, Alertmanager routing for notifications, and service discovery configurations for dynamic environments like Kubernetes, Consul, and EC2.

Usage

Describe your infrastructure (bare metal, VMs, Kubernetes, cloud services), the applications you need to monitor, and your alerting requirements. Specify which exporters you are using (node_exporter, blackbox, etc.) and notification channels (Slack, PagerDuty, email). The skill generates prometheus.yml, alert rule files, and Alertmanager configuration.

Examples

"Configure Prometheus to scrape a Kubernetes cluster with pod auto-discovery and namespace filtering"
"Create alerting rules for high CPU usage, disk space warnings, and service down detection with escalation"
"Set up blackbox exporter probes for HTTP endpoint monitoring with certificate expiry alerts"
"Build recording rules for precomputing request rate, error rate, and latency percentiles (RED method)"

Guidelines

Use relabel_configs to filter and transform labels at scrape time, metric_relabel_configs post-scrape
Create recording rules for frequently queried or dashboard-used expressions to reduce query load
Structure alerting rules with severity labels (critical, warning, info) for proper routing and escalation
Set appropriate scrape intervals: 15s for application metrics, 60s for infrastructure, 300s for slow-changing
Use honor_labels: true when scraping Pushgateway or federated Prometheus instances
Configure Alertmanager with group_wait, group_interval, and repeat_interval to prevent alert fatigue
Add inhibition rules so critical alerts suppress related warning alerts for the same component
Use the for clause in alerts to require sustained conditions before firing (avoid flapping)

Prometheus Monitoring

Usage

Examples

Guidelines

More System Skills