Alerts

The following page shows the alerts configured for prometheus, the servers and the triggering time.

[PRD] ZWE MoH - Virtual Servers

Alert	Trigger	Severity	Time to re-trigger
Host down	Metrics can not be retrieved from host	CRITICAL	15 minutes
High CPU usage	CPU usage gets above 85%	MEDIUM	15 minutes
Low memory	Machine memory get below 15%	MEDIUM	15 minutes
Disk space low	Machine storage gets below 10%	MEDIUM	1 hour
High disk I/O latency	Machine average input output exceeds 70%	MEDIUM	30 minutes
Network error	Machine average network errors exceds 0 (meaning that something above 0 is error)	MEDIUM	2 hours