Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 2 Next »

The following page shows the alerts configured for prometheus, the servers and the triggering time.

[PRD] ZWE MoH - Analytics

Alert

Trigger

Time to re-trigger

Host down

Metrics can not be retrieved from host

15 minutes

High CPU usage

CPU usage gets above 85%

2 hours

Low memory

Machine memory get below 15%

2 hours

Disk space low

Machine storage gets below 10%

2 hours

High disk I/O latency

Machine average input output exceeds 70%

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

2 hours

[PRD] ZWE MoH - eLearning

Alert

Trigger

Time to re-trigger

Host down

Metrics can not be retrieved from host

15 minutes

High CPU usage

CPU usage gets above 85%

2 hours

Low memory

Machine memory get below 15%

2 hours

Disk space low

Machine storage gets below 10%

2 hours

High disk I/O latency

Machine average input output exceeds 70%

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

20 hours

[PRD] ZWE MoH - VMMC

Alert

Trigger

Time to re-trigger

Host down

Metrics can not be retrieved from host

15 minutes

High CPU usage

CPU usage gets above 85%

2 hours

Low memory

Machine memory get below 15%

2 hours

Disk space low

Machine storage gets below 10%

2 hours

High disk I/O latency

Machine average input output exceeds 70%

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

20 hours

[PRD] ZWE MoH - Monitoring

Alert

Trigger

Time to re-trigger

Host down

Metrics can not be retrieved from host

15 minutes

High CPU usage

CPU usage gets above 85%

2 hours

Low memory

Machine memory get below 15%

2 hours

Disk space low

Machine storage gets below 10%

2 hours

High disk I/O latency

Machine average input output exceeds 70%

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

20 hours

  • No labels