System Monitoring Setup

Name: System Monitoring Setup
Author: Community

Set up comprehensive system monitoring and alerting for servers and applications. Covers metric collection, dashboard design, alert configuration, and incident response integration.

Usage

Describe your infrastructure and what you need to monitor. The guide recommends tools, configures metric collection, designs dashboards, and sets up meaningful alerts that reduce noise.

Parameters

Scale: Single server, Small cluster (2-10), or Large infrastructure (10+)
Stack: Prometheus+Grafana, Datadog, CloudWatch, or Lightweight (scripts)
Metrics: System resources, Application performance, or Both
Budget: Free/open-source only, Budget-friendly, or Enterprise

Examples

Single Server Monitoring: Set up lightweight monitoring for a VPS — node_exporter + Prometheus + Grafana with dashboards for CPU, memory, disk, network, and PM2 process metrics.

Web Application Monitoring: Monitor a Next.js application — response times, error rates, throughput, database query performance, and user-facing availability with uptime checks.

Docker Host Monitoring: Monitor a Docker host and all containers — cAdvisor for container metrics, per-container CPU/memory/network, and volume usage with container restart alerting.

Alert Fatigue Reduction: Redesign an alerting system that sends too many notifications — implement severity levels, alert grouping, escalation chains, and root-cause-based alerts.

Guidelines

Monitoring follows the USE method (Utilization, Saturation, Errors) for system resources
Application monitoring follows RED method (Rate, Errors, Duration) for services
Alert thresholds are set based on historical baselines, not arbitrary numbers
Alerts use severity levels: Info (log only), Warning (investigate), Critical (page on-call)
Dashboard design shows the most important metrics at a glance with drill-down capability
Data retention balances storage cost with the need for historical trend analysis
Synthetic monitoring (uptime checks) catches issues before users report them
Log-based metrics supplement traditional monitoring for application-specific insights
Runbooks are linked to alerts so responders know what actions to take
Monitoring itself is monitored — dead monitoring is worse than no monitoring

System Monitoring Setup

Usage

Parameters

Examples

Guidelines

More System Skills