Alerts
The following page shows the alerts configured for prometheus, the servers and the triggering time.
ZWE MoH - Production Virtual Servers
Alert | Trigger | Severity | Time to re-trigger |
---|---|---|---|
Host down | Metrics can not be retrieved from host | critical | 15 minutes |
High CPU usage | CPU usage gets above 85% | Medium | 15 minutes |
Low memory | Machine memory get below 15% | Medium | 15 minutes |
Disk space low | Machine storage gets below 10% | Medium | 1 hour |
High disk I/O latency | Machine average input output exceeds 70% | Medium | 30 minutes |
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | Medium | 2 hours |
VMMC MoH - Production Virtual Servers
Container | Trigger | Severity | Time to re-trigger |
---|---|---|---|
Tomcat: DWS/ WFA | container down | critical | 5 minutes |
Mongo | |||
Kafka | |||
Zoo keeper | |||
Node.js |
e-Learning MoH - Production Virtual Servers
Container | Trigger | Severity | Time to re-trigger |
---|---|---|---|
Postgres/ Moodle | container down | critical | 5 minutes |
Moodle | |||
Postgres/ Warehouse | |||
NiFi | |||
Superset |