/
Troubleshooting and Restore
  • Rough draft
  • Troubleshooting and Restore

    On this page:

    Sudden shutdowns to the servers can cause multiple issues to the software running on them. The most common errors are due to corruption of Docker, containers, or database instance. If any of the systems running on the MoH servers is down, follow the below guidelines to troubleshoot, identify the cause and fix the errors.


    Troubleshooting

    Assess server current status

    The application could be down because of two main reasons:

    1. The server is not online, in which case nothing can be done until the server is back online and accessible.

    2. The server is online, but the applications could not restart because something got corrupted.

    Access the monitoring server (https://bao-zwe.monitoring.psidigital.org/grafana) to verify the different servers status and applications. The ‘Metrics’ dashboard will display the server status and the current resources utilization. The ‘Web apps’ dashboard displays the status of the different applications running in any of the servers.

    E.g. The eLearning server could be online, in which case it will appear as UP in the Metrics dashboard, but if Moodle is not running, it will show as DOWN in the Web apps dashboard.

    If the server is online, but any application like Moodle is not running, continue to the next step.

    Identify the issue

    Enter the server using SSH and start by checking the application status.

    Docker commands can only be executed with sudo privilegies or as the root user.

    Execute the following command to check which Docker containers are not running:

    docker ps

    This is a list of the containers that every server should be running

    eLearning server

    • moodle-moodle-1

    • moodle-postgresql-1

    Analytics server

    VMMC Server

    1. Firstly, try to start the stopped containers. If after some minutes the containers are still up, it is probable that nothing is corrupted and a restart policy was not set for the containers. If the containers show an uptime of several minutes under the status column, the application most likely will be working.

      docker ps -a #to display all containers, even stopped ones docker start <container-id> #start Docker containers

    If executing any of the above commands throws an error, it’s probably that Docker itself got corrupted and needs to be reinstalled on the machine. Follow the Docker documentation to perform this process.

    Do not need to worry about losing data when uninstalling Docker, the container volumes are stored in the /opt folder and will persist even if Docker is completed removed from the machine.

    1. If the containers turn off immediately after being started, it is probable that something is corrupted. The most common corruption causes are: loss of database integrity, and corruption of the container’s metadata files, both caused because of an improper and sudden shutdown of the system. Use the command below to see the Docker container logs and further diagnose the issue.

      docker logs <container-id>

      However, no matter the root cause of the issue, if something is not working as expected, re-creating the Docker containers can quickly solve the problems.

    Restore corrupted Docker containers

    Containers are individual spaces where the applications run; however, the important data is stored in the volumes, so the containers can easily be deleted and re-created without having to restore backups as long as the volume folders are still there.

    1. Start by deleting the Docker containers. Make sure the containers are stopped first.

    2. Navigate to the /opt folder and look for a file named docker-compose.yml. This file has all the instructions to recreate the containers and connect the existing volumes.

    3. To execute the docker compose file, run the following command while on the same path as the .yml file

    4. (Analytics machine only) For the analytics machine, it is necessary to execute an additional command before and after restoring the containers.

    Related content