Skip to content

Infrastructure Alerts

These alerts monitor the Breeze platform itself — API health, database, Redis, and disk usage. For alerts about your managed devices, see RMM Alerts.

If you deploy the optional observability stack (docker-compose.monitoring.yml), Breeze supports infrastructure-level alerting through Prometheus and Alertmanager.

The file monitoring/rules/breeze-rules.yml ships with these rules:

AlertSeverityCondition
HighErrorRatecriticalError rate > 5% for 5 minutes
SlowResponseTimewarningP95 latency > 2s for 10 minutes
APIServiceDowncriticalAPI target down for 2 minutes
EndpointLatencyHighwarningAny endpoint P95 > 5s for 5 minutes
High4xxRatewarning4xx rate > 20% for 10 minutes
AlertSeverityCondition
RedisDowncriticalRedis exporter down for 2 minutes
RedisMemoryHighwarningRedis memory > 80% of max
PostgresDowncriticalPostgres exporter down for 2 minutes
PostgresConnectionPoolSaturatedwarningConnections > 80% of max
DiskSpaceLowwarningDisk usage > 85%
AlertSeverityCondition
NoAgentHeartbeatscriticalZero heartbeats received for 5 minutes
AlertProcessingBacklogwarningAlert queue depth > 100 for 10 minutes

Alertmanager routes infrastructure alerts by severity. Edit monitoring/alertmanager.yml to configure receivers:

route:
receiver: default
group_by: ['alertname', 'severity', 'job']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
routes:
- match:
severity: critical
receiver: critical-alerts
group_wait: 10s
repeat_interval: 1h
- match:
severity: warning
receiver: warning-alerts
- match_re:
alertname: '^(Redis|Postgres|DiskSpace).*'
receiver: infrastructure-alerts

When a critical alert fires, related warning alerts for the same alert name are automatically suppressed via inhibition rules.

Configuring Infrastructure Notification Channels

Section titled “Configuring Infrastructure Notification Channels”

Uncomment and edit the receiver blocks in monitoring/alertmanager.yml to enable Slack, PagerDuty, or email for infrastructure alerts:

receivers:
- name: critical-alerts
slack_configs:
- api_url: 'https://hooks.slack.com/services/YOUR/WEBHOOK/URL'
channel: '#alerts-critical'
send_resolved: true
title: '{{ .Status | toUpper }}: {{ .GroupLabels.alertname }}'

After editing, restart Alertmanager:

Terminal window
docker compose -f docker-compose.yml -f docker-compose.monitoring.yml restart alertmanager

Add custom Prometheus rules in monitoring/rules/:

monitoring/rules/custom-rules.yml
groups:
- name: custom-alerts
rules:
- alert: HighAgentChurn
expr: rate(breeze_device_enrollments_total[1h]) > 10
for: 30m
labels:
severity: warning
annotations:
summary: "High agent enrollment rate"
description: "More than 10 new enrollments per hour for 30 minutes"

Prometheus automatically picks up new rule files. Reload the config without restarting:

Terminal window
curl -X POST http://localhost:9090/-/reload