...
Table of Contents | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...
ZWE MoH -
...
Production Virtual Servers
Alert
Trigger
Time to re-trigger
Host down
Metrics can not be retrieved from host
15 minutes
High CPU usage
CPU usage gets above 85%
2 hours
Low memory
Machine memory get below 15%
2 hours
Alert | Trigger | Severity | Time to re-trigger | ||||||
---|---|---|---|---|---|---|---|---|---|
Host down | Metrics can not be retrieved from host |
| 15 minutes | ||||||
High CPU usage | CPU usage gets above 85%2 hours |
| 15 minutes | ||||||
Low memory | Machine memory get below 15% | 2 hours | |||||||
Disk space low | Machine storage gets below 10% | 2 hours | |||||||
High disk I/O latency | Machine average input output exceeds 70% | 30 minutes | |||||||
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) | 2 hours |
[PRD] ZWE MoH - eLearning
| 15 minutes | ||||||
Disk space low | Machine storage gets below 10% |
| 1 hour | ||||||||
High disk I/O latency | Machine average input output exceeds 70% |
| 30 minutes | ||||||
Network error | Machine average network errors exceds 0 (meaning that something above 0 is error) |
| 2 hours |
...
VMMC MoH -
...
Production Virtual Servers
Container | Trigger | Severity | Time to re-trigger |
---|
Host down
Metrics can not be retrieved from host
15 minutes
High CPU usage
CPU usage gets above 85%
2 hours
Low memory
Machine memory get below 15%
2 hours
Disk space low
Machine storage gets below 10%
2 hours
High disk I/O latency
Machine average input output exceeds 70%
30 minutes
Network error
Machine average network errors exceds 0 (meaning that something above 0 is error)
2 hours
[PRD] ZWE MoH - Monitoring
Alert
Tomcat: DWS/ WFA | container down |
| 5 minutes | ||||||
Mongo | |||||||||
Kafka | |||||||||
Zoo keeper | |||||||||
Node.js |
e-Learning MoH - Production Virtual Servers
Container | Trigger | Severity | Time to re-trigger |
---|
Postgres/ Moodle | container down |
Metrics can not be retrieved from host
15 minutes
High CPU usage
CPU usage gets above 85%
2 hours
Low memory
Machine memory get below 15%
2 hours
Disk space low
Machine storage gets below 10%
2 hours
High disk I/O latency
Machine average input output exceeds 70%
30 minutes
Network error
Machine average network errors exceds 0 (meaning that something above 0 is error)
| 5 minutes | ||||||
Moodle | |||||||
Postgres/ Warehouse | |||||||
NiFi | |||||||
Superset |