📊

Prometheus Monitoring

Verified

by Community

Creates Prometheus configurations including scrape targets, recording rules, alerting rules, service discovery, and integration with Alertmanager for comprehensive infrastructure and application monitoring.

prometheusmonitoringalertingobservabilitydevops

Prometheus Monitoring

Creates comprehensive Prometheus monitoring configurations including scrape job definitions, relabeling rules, recording rules for precomputed metrics, alerting rules with severity levels, Alertmanager routing for notifications, and service discovery configurations for dynamic environments like Kubernetes, Consul, and EC2.

Usage

Describe your infrastructure (bare metal, VMs, Kubernetes, cloud services), the applications you need to monitor, and your alerting requirements. Specify which exporters you are using (node_exporter, blackbox, etc.) and notification channels (Slack, PagerDuty, email). The skill generates prometheus.yml, alert rule files, and Alertmanager configuration.

Examples

  • "Configure Prometheus to scrape a Kubernetes cluster with pod auto-discovery and namespace filtering"
  • "Create alerting rules for high CPU usage, disk space warnings, and service down detection with escalation"
  • "Set up blackbox exporter probes for HTTP endpoint monitoring with certificate expiry alerts"
  • "Build recording rules for precomputing request rate, error rate, and latency percentiles (RED method)"

Guidelines

  • Use relabel_configs to filter and transform labels at scrape time, metric_relabel_configs post-scrape
  • Create recording rules for frequently queried or dashboard-used expressions to reduce query load
  • Structure alerting rules with severity labels (critical, warning, info) for proper routing and escalation
  • Set appropriate scrape intervals: 15s for application metrics, 60s for infrastructure, 300s for slow-changing
  • Use honor_labels: true when scraping Pushgateway or federated Prometheus instances
  • Configure Alertmanager with group_wait, group_interval, and repeat_interval to prevent alert fatigue
  • Add inhibition rules so critical alerts suppress related warning alerts for the same component
  • Use the for clause in alerts to require sustained conditions before firing (avoid flapping)