Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 4 Next »

The following page shows the alerts configured for prometheus, the servers and the triggering time.

[PRD] ZWE MoH - Virtual Servers

Alert

Trigger

Severity

Time to re-trigger

Host down

Metrics can not be retrieved from host

CRITICAL

15 minutes

High CPU usage

CPU usage gets above 85%

MEDIUM

15 minutes

Low memory

Machine memory get below 15%

MEDIUM

15 minutes

Disk space low

Machine storage gets below 10%

MEDIUM

1 hour

High disk I/O latency

Machine average input output exceeds 70%

MEDIUM

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

MEDIUM

2 hours

  • No labels