The following page shows the alerts configured for prometheus, the servers and the triggering time.
[PRD] ZWE MoH - Analytics
Alert | Trigger | Time to re-trigger |
---|---|---|
Host down | Metrics can not be retrieved from host | 15 minutes |
High CPU usage | CPU usage gets above 85% | 2 hours |
Low memory | Machine memory get below 15% | 2 hours |
Disk space low | Machine storage gets below 10% | 2 hours |
High disk I/O latency | Machine average input output exceeds 70% | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | 2 hours |
[PRD] ZWE MoH - eLearning
Alert | Trigger | Time to re-trigger |
---|---|---|
Host down | Metrics can not be retrieved from host | 15 minutes |
High CPU usage | CPU usage gets above 85% | 2 hours |
Low memory | Machine memory get below 15% | 2 hours |
Disk space low | Machine storage gets below 10% | 2 hours |
High disk I/O latency | Machine average input output exceeds 70% | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | 2 hours |
[PRD] ZWE MoH - VMMC
Alert | Trigger | Time to re-trigger |
---|---|---|
Host down | Metrics can not be retrieved from host | 15 minutes |
High CPU usage | CPU usage gets above 85% | 2 hours |
Low memory | Machine memory get below 15% | 2 hours |
Disk space low | Machine storage gets below 10% | 2 hours |
High disk I/O latency | Machine average input output exceeds 70% | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | 2 hours |
[PRD] ZWE MoH - Monitoring
Alert | Trigger | Time to re-trigger |
---|---|---|
Host down | Metrics can not be retrieved from host | 15 minutes |
High CPU usage | CPU usage gets above 85% | 2 hours |
Low memory | Machine memory get below 15% | 2 hours |
Disk space low | Machine storage gets below 10% | 2 hours |
High disk I/O latency | Machine average input output exceeds 70% | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | 2 hours |