Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 5 Current »

The following page shows the alerts configured for prometheus, the servers and the triggering time.

ZWE MoH - Production Virtual Servers

Alert

Trigger

Severity

Time to re-trigger

Host down

Metrics can not be retrieved from host

CRITICAL

15 minutes

High CPU usage

CPU usage gets above 85%

MEDIUM

15 minutes

Low memory

Machine memory get below 15%

MEDIUM

15 minutes

Disk space low

Machine storage gets below 10%

MEDIUM

1 hour

High disk I/O latency

Machine average input output exceeds 70%

MEDIUM

30 minutes

Network error

Machine average network errors exceds 0 (meaning that something above 0 is error)

MEDIUM

2 hours

VMMC MoH - Production Virtual Servers

Container

Trigger

Severity

Time to re-trigger

Tomcat: DWS/ WFA

container down

CRITICAL

5 minutes

Mongo

Kafka

Zoo keeper

Node.js

e-Learning MoH - Production Virtual Servers

Container

Trigger

Severity

Time to re-trigger

Postgres/ Moodle

container down

CRITICAL

5 minutes

Moodle

Postgres/ Warehouse

NiFi

Superset

VMMC

  • No labels