Infrastructure Alerts
These alerts monitor the Breeze platform itself — API health, database, Redis, and disk usage. For alerts about your managed devices, see RMM Alerts.
If you deploy the optional observability stack (docker-compose.monitoring.yml), Breeze supports infrastructure-level alerting through Prometheus and Alertmanager.
Pre-Configured Infrastructure Alert Rules
Section titled “Pre-Configured Infrastructure Alert Rules”The file monitoring/rules/breeze-rules.yml ships with these rules:
API Alerts
Section titled “API Alerts”| Alert | Severity | Condition |
|---|---|---|
HighErrorRate | critical | Error rate > 5% for 5 minutes |
SlowResponseTime | warning | P95 latency > 2s for 10 minutes |
APIServiceDown | critical | API target down for 2 minutes |
EndpointLatencyHigh | warning | Any endpoint P95 > 5s for 5 minutes |
High4xxRate | warning | 4xx rate > 20% for 10 minutes |
Infrastructure Alerts
Section titled “Infrastructure Alerts”| Alert | Severity | Condition |
|---|---|---|
RedisDown | critical | Redis exporter down for 2 minutes |
RedisMemoryHigh | warning | Redis memory > 80% of max |
PostgresDown | critical | Postgres exporter down for 2 minutes |
PostgresConnectionPoolSaturated | warning | Connections > 80% of max |
DiskSpaceLow | warning | Disk usage > 85% |
Business Alerts
Section titled “Business Alerts”| Alert | Severity | Condition |
|---|---|---|
NoAgentHeartbeats | critical | Zero heartbeats received for 5 minutes |
AlertProcessingBacklog | warning | Alert queue depth > 100 for 10 minutes |
Alertmanager Routing
Section titled “Alertmanager Routing”Alertmanager routes infrastructure alerts by severity. Edit monitoring/alertmanager.yml to configure receivers:
route: receiver: default group_by: ['alertname', 'severity', 'job'] group_wait: 30s group_interval: 5m repeat_interval: 4h
routes: - match: severity: critical receiver: critical-alerts group_wait: 10s repeat_interval: 1h
- match: severity: warning receiver: warning-alerts
- match_re: alertname: '^(Redis|Postgres|DiskSpace).*' receiver: infrastructure-alertsWhen a critical alert fires, related warning alerts for the same alert name are automatically suppressed via inhibition rules.
Configuring Infrastructure Notification Channels
Section titled “Configuring Infrastructure Notification Channels”Uncomment and edit the receiver blocks in monitoring/alertmanager.yml to enable Slack, PagerDuty, or email for infrastructure alerts:
receivers: - name: critical-alerts slack_configs: - api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL' channel: '#alerts-critical' send_resolved: true title: '{{ .Status | toUpper }}: {{ .GroupLabels.alertname }}'receivers: - name: critical-alerts pagerduty_configs: - service_key: 'YOUR_PAGERDUTY_SERVICE_KEY' severity: 'critical'receivers: - name: warning-alerts email_configs: smarthost: 'smtp.yourdomain.com:587' auth_password: 'password'After editing, restart Alertmanager:
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml restart alertmanagerCustom Infrastructure Alert Rules
Section titled “Custom Infrastructure Alert Rules”Add custom Prometheus rules in monitoring/rules/:
groups: - name: custom-alerts rules: - alert: HighAgentChurn expr: rate(breeze_device_enrollments_total[1h]) > 10 for: 30m labels: severity: warning annotations: summary: "High agent enrollment rate" description: "More than 10 new enrollments per hour for 30 minutes"Prometheus automatically picks up new rule files. Reload the config without restarting:
curl -X POST http://localhost:9090/-/reload