The following page shows the alerts configured for prometheus, the servers and the triggering time.
[PRD] ZWE MoH - Virtual Servers
Alert | Trigger | Severity | Time to re-trigger |
---|---|---|---|
Host down | Metrics can not be retrieved from host | CRITICAL | 15 minutes |
High CPU usage | CPU usage gets above 85% | MEDIUM | 15 minutes |
Low memory | Machine memory get below 15% | MEDIUM | 15 minutes |
Disk space low | Machine storage gets below 10% | MEDIUM | 1 hour |
High disk I/O latency | Machine average input output exceeds 70% | MEDIUM | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | MEDIUM | 2 hours |