Setting up Production grade MongoDB using Docker

Introduction

Docker has very quickly evolved from a nascent technology to a ubiquitous deployment strategy for a wide variety of projects. However, Docker's use for running long-lasting and production grade database systems is still perceived with subjective hesitation. The primary reason for this mistrust is relatively lower database performance compared to native installations, which is expected due to introduction of an additional networking layer (Docker's intermediate network drivers) between database clients and the server.

Another reason why system architects shy away from using Docker for database deployments is the general sense of unease around keeping their precious data with a less used/understood database setup.

While I too can be accused of having exactly the same biases against Docker, we can still achieve reasonable efficiency and production-ish robustness for early-stage POC startups and personal projects.

In this article, we will try to address some of the concerns system architects have by setting up a production-grade MongoDB system using Docker.

Scope

In this setup, we will look at establishing a MongoDB setup which:

Can recover from failures.
Can recover from unplanned system reboots.
Has a reliable logrotation policy.
Has automated backup and restore strategies.

For the sake of brevity and avoiding bloat, this article considers the following features/strategies out of scope:

MongoDB replication and sharding.
MongoDB authentication.

Setup

Running Docker container

Data generated by Docker containers is lost when the container is stopped. Since we obviously would not want to lose our priceless data, we will work around this problem by using Docker's bind mounts. Bind mounts allow a file or directory on the host machine to be mounted into a container.

Preparing directory structure on host.

mkdir -p /home/surender/data/mongo/db
mkdir -p /home/surender/data/mongo/backup

As the directory names suggest, /home/surender/data/mongo/db will be used to persist MongoDB's database files between container runs and /home/surender/data/mongo/db will be used to store MongoDB backups.

Running a perennial MongoDB docker container.

docker run -d -p 27017:27017 \
--restart unless-stopped \
--log-driver json-file \
--log-opt max-size=10m \
--log-opt max-file=5 \
-e TZ=Asia/Kolkata \
-v /home/surender/data/mongo/db:/data/db \
-v /home/surender/data/mongo/backup:/data/backup \
--name mongo_container \
mongo:4.0.8 --timeStampFormat ctime

We will use the above command to run MongoDB inside a container and make sure that it can recover from crashes and reboots. Also, the container should generate usable logs but should not overrun system's storage with log files if left to run for a long time.

Let us have a look at all the options in the command:

-d runs the container in daemon/detached mode. The container will run as a background task and will not terminate when the current shell is closed.
-p publishes a port from inside the container and binds it to a port on the host. Here, we have kept both host and container ports identical (27017) to avoid ambiguity and simulate a native MongoDB setup.
--restart unless-stopped sets the container's restart policy to always restart the container unless explicitly stopped. This policy is imperative for recovery from container crashes and system reboots.
--log-driver json-file configures the container to use the default logging driver json-file. Container will output logs to a log file formatted as JSONs.
--log-opt max-size=10m configures the container's logging strategy to not create a log file larger than 10 megabytes. Large log files are difficult to be parsed in most text editors and are often unusable. If a log file's size exceeds the specified limit the log file is rotated i.e. a new log file is created for logging new entries while the existing log file is renamed and archived.
--log-opt max-file=5 directs the docker daemon to not keep more than 5 archived log files. When a new archived log file is created after log rotation the oldest archived log file is deleted. This prevents the system storage from being run over with log files.
-e TZ=Asia/Kolkata sets the timezone environment variable TZ inside the container to Asia/Kolkata. The mongod process inside the container will use this timezone while creating Mongo logs. This can be treated as an optional configuration.
-v option like we discussed earlier, bind mounts a directory on the host with a directory inside the container. In our container, MongoDB will use /data/db directory to store database files and we will be using /data/backup directory to dump our MongoDB backups.
--timeStampFormat ctime option unlike all other options is not a docker command's option, instead it is an option on the mongod service. It configures the MongoDB to use a more human-readable timestamp in the logs. This is an optional setting and we can choose to omit it.

Manual backups and restores

Since the mongod process is running inside the container we cannot directly invoke the mongodump command to backup our databases. We will be using docker exec command to run the mongodump inside the container.

docker exec -it mongo_container mongodump --out /data/backup/manual

In the above command, we are dumping the backup files in the /data/backup/manual directory inside the container. Since we have already bind mounted its parent to the host, the backup is available in the host and persisted even after the docker container is stopped.

We can use the mongorestore command to restore from the backed up files using the following command.

docker exec -it mongo_container mongorestore /data/backup/manual --drop

The --drop option drops the collections from the target database before restoring the collections from the dumped backup. It does not drop collections that are not in the backup.

Backup and restore scripts

It is all well and good to manually backup to a directory and restore from it but a production grade system needs to automate the backup process. The setup should allow to make versioned backups and restore from any of those backups.

Before we schedule our backups, let us script the backup and restore processes to conform to our other requirements.

Backup Script

The following script (named mongo-backup.sh) is meant for daily runs and creates date stamped backup tarballs like 2019-04-23.tar.gz. It also removes existing backup tarballs if they are older than 15 days.

#!/bin/sh

set -e

HOST_BACKUPS_DIR=/home/surender/data/mongo/backup
DOCKER_BACKUPS_DIR=/data/backup

# Creates backup names like 2019-04-23
BACKUP_NAME=`date +%F`

HOST_BACKUP_DEST=$HOST_BACKUPS_DIR/$BACKUP_NAME
DOCKER_BACKUP_DEST=$DOCKER_BACKUPS_DIR/$BACKUP_NAME

# Do not keep backups older than 15 days.
BACKUP_TTL_DAYS=15

echo `date` Backing up in $DOCKER_BACKUP_DEST
docker exec mongo_container mongodump --out $DOCKER_BACKUP_DEST

echo Compressing backup directory to $BACKUP_NAME.tar.gz
cd $HOST_BACKUPS_DIR
tar -zcvf $BACKUP_NAME.tar.gz $BACKUP_NAME

echo Removing backup directory $HOST_BACKUP_DEST
rm -rf $HOST_BACKUP_DEST

echo Deleting backup tarballs older than $BACKUP_TTL_DAYS days in $HOST_BACKUPS_DIR
find $HOST_BACKUPS_DIR -type f -mtime +$BACKUP_TTL_DAYS -exec rm '{}' +

echo `date` Mongo backup successful

Restore Script

Similar to a backup script, we would also need a script to restore our databases from the backup tarballs our backup script generates.

The following script (named mongo-restore.sh) accepts the backup tarball name as command-line argument and restores it to the MongoDB databases.

#!/bin/sh

set -e

HOST_BACKUPS_DIR=/home/surender/data/mongo/backup
DOCKER_BACKUPS_DIR=/data/backup

# Command line argument by the user to specify the backup version to restore.
# Should be of format yyyy-mm-dd
BACKUP_NAME=$1

HOST_BACKUP_DEST=$HOST_BACKUPS_DIR/$BACKUP_NAME
DOCKER_BACKUP_DEST=$DOCKER_BACKUPS_DIR/$BACKUP_NAME

echo `date` Restoring from $DOCKER_BACKUP_DEST

if [ -f $HOST_BACKUP_DEST.tar.gz ]
then
  echo Uncompressing backup tarball $HOST_BACKUP_DEST.tar.gz
  cd $HOST_BACKUPS_DIR
  tar -zxvf $BACKUP_NAME.tar.gz

  echo Restoring from $DOCKER_BACKUP_DEST
  docker exec -it mongo_container mongorestore $DOCKER_BACKUP_DEST --drop

  echo `date` Restore successful
else
  echo `date` Backup tarball $HOST_BACKUP_DEST.tar.gz not found!
fi

Scheduling Backups

We will setup a nightly crontab to run our backup script at 0100 hrs daily. However, before we start our cron we need to make it auditable (sigh). Thankfully, that is easy to achive. Let us create a log file to record backup script's exploits in every scheduled run.

mkdir -p /home/surender/data/mongo/logs
touch /home/surender/data/mongo/logs/backup.log
chmod 666 /home/surender/data/mongo/logs/backup.log

Now, let us make the following crontab entry where we direct the backup script's stdout and stderr streams to the created log file.

0 1 * * * /home/surender/mongo-backup.sh >> /home/surender/data/mongo/logs/backup.log 2>&1

It is imperative that we use >> instead of > otherwise at each scheduled run the script will overwrite the previous logs in the file instead of appending to the existing logs.

Restoring from scheduled backups

To restore from any of the available backup tarballs, simply run the following command with the tarball's name as argument which is in yyyy-mm-dd format.

./home/surender/mongo-restore.sh <yyyy-mm-dd>