Setting up Production grade MongoDB using Docker

15 min read

Mongo DB

Docker

NoSQL

Shell Script

Logrotate

Cron

Introduction

Docker has very quickly evolved from a nascent technology to a ubiquitous deployment strategy for a wide variety of projects. However, Docker's use for running long-lasting and production grade database systems is still perceived with subjective hesitation. The primary reason for this mistrust is relatively lower database performance compared to native installations, which is expected due to introduction of an additional networking layer (Docker's intermediate network drivers) between database clients and the server.

Another reason why system architects shy away from using Docker for database deployments is the general sense of unease around keeping their precious data with a less used/understood database setup.

While I too can be accused of having exactly the same biases against Docker, we can still achieve reasonable efficiency and production-ish robustness for early-stage POC startups and personal projects.

In this article, we will try to address some of the concerns system architects have by setting up a production-grade MongoDB system using Docker.

Scope

In this setup, we will look at establishing a MongoDB setup which:

For the sake of brevity and avoiding bloat, this article considers the following features/strategies out of scope:

Setup

Running Docker container

Data generated by Docker containers is lost when the container is stopped. Since we obviously would not want to lose our priceless data, we will work around this problem by using Docker's bind mounts. Bind mounts allow a file or directory on the host machine to be mounted into a container.

Preparing directory structure on host.

mkdir -p /home/surender/data/mongo/db
mkdir -p /home/surender/data/mongo/backup

As the directory names suggest, /home/surender/data/mongo/db will be used to persist MongoDB's database files between container runs and /home/surender/data/mongo/db will be used to store MongoDB backups.

Running a perennial MongoDB docker container.

docker run -d -p 27017:27017 \
--restart unless-stopped \
--log-driver json-file \
--log-opt max-size=10m \
--log-opt max-file=5 \
-e TZ=Asia/Kolkata \
-v /home/surender/data/mongo/db:/data/db \
-v /home/surender/data/mongo/backup:/data/backup \
--name mongo_container \
mongo:4.0.8 --timeStampFormat ctime

We will use the above command to run MongoDB inside a container and make sure that it can recover from crashes and reboots. Also, the container should generate usable logs but should not overrun system's storage with log files if left to run for a long time.

Let us have a look at all the options in the command:

Manual backups and restores

Since the mongod process is running inside the container we cannot directly invoke the mongodump command to backup our databases. We will be using docker exec command to run the mongodump inside the container.

docker exec -it mongo_container mongodump --out /data/backup/manual

In the above command, we are dumping the backup files in the /data/backup/manual directory inside the container. Since we have already bind mounted its parent to the host, the backup is available in the host and persisted even after the docker container is stopped.

We can use the mongorestore command to restore from the backed up files using the following command.

docker exec -it mongo_container mongorestore /data/backup/manual --drop

The --drop option drops the collections from the target database before restoring the collections from the dumped backup. It does not drop collections that are not in the backup.

Backup and restore scripts

It is all well and good to manually backup to a directory and restore from it but a production grade system needs to automate the backup process. The setup should allow to make versioned backups and restore from any of those backups.

Before we schedule our backups, let us script the backup and restore processes to conform to our other requirements.

Backup Script

The following script (named mongo-backup.sh) is meant for daily runs and creates date stamped backup tarballs like 2019-04-23.tar.gz. It also removes existing backup tarballs if they are older than 15 days.

#!/bin/sh

set -e

HOST_BACKUPS_DIR=/home/surender/data/mongo/backup
DOCKER_BACKUPS_DIR=/data/backup

# Creates backup names like 2019-04-23
BACKUP_NAME=`date +%F`

HOST_BACKUP_DEST=$HOST_BACKUPS_DIR/$BACKUP_NAME
DOCKER_BACKUP_DEST=$DOCKER_BACKUPS_DIR/$BACKUP_NAME

# Do not keep backups older than 15 days.
BACKUP_TTL_DAYS=15

echo `date` Backing up in $DOCKER_BACKUP_DEST
docker exec mongo_container mongodump --out $DOCKER_BACKUP_DEST

echo Compressing backup directory to $BACKUP_NAME.tar.gz
cd $HOST_BACKUPS_DIR
tar -zcvf $BACKUP_NAME.tar.gz $BACKUP_NAME

echo Removing backup directory $HOST_BACKUP_DEST
rm -rf $HOST_BACKUP_DEST

echo Deleting backup tarballs older than $BACKUP_TTL_DAYS days in $HOST_BACKUPS_DIR
find $HOST_BACKUPS_DIR -type f -mtime +$BACKUP_TTL_DAYS -exec rm '{}' +

echo `date` Mongo backup successful

Restore Script

Similar to a backup script, we would also need a script to restore our databases from the backup tarballs our backup script generates.

The following script (named mongo-restore.sh) accepts the backup tarball name as command-line argument and restores it to the MongoDB databases.

#!/bin/sh

set -e

HOST_BACKUPS_DIR=/home/surender/data/mongo/backup
DOCKER_BACKUPS_DIR=/data/backup

# Command line argument by the user to specify the backup version to restore.
# Should be of format yyyy-mm-dd
BACKUP_NAME=$1

HOST_BACKUP_DEST=$HOST_BACKUPS_DIR/$BACKUP_NAME
DOCKER_BACKUP_DEST=$DOCKER_BACKUPS_DIR/$BACKUP_NAME

echo `date` Restoring from $DOCKER_BACKUP_DEST

if [ -f $HOST_BACKUP_DEST.tar.gz ]
then
  echo Uncompressing backup tarball $HOST_BACKUP_DEST.tar.gz
  cd $HOST_BACKUPS_DIR
  tar -zxvf $BACKUP_NAME.tar.gz

  echo Restoring from $DOCKER_BACKUP_DEST
  docker exec -it mongo_container mongorestore $DOCKER_BACKUP_DEST --drop

  echo `date` Restore successful
else
  echo `date` Backup tarball $HOST_BACKUP_DEST.tar.gz not found!
fi

Scheduling Backups

We will setup a nightly crontab to run our backup script at 0100 hrs daily. However, before we start our cron we need to make it auditable (sigh). Thankfully, that is easy to achive. Let us create a log file to record backup script's exploits in every scheduled run.

mkdir -p /home/surender/data/mongo/logs
touch /home/surender/data/mongo/logs/backup.log
chmod 666 /home/surender/data/mongo/logs/backup.log

Now, let us make the following crontab entry where we direct the backup script's stdout and stderr streams to the created log file.

0 1 * * * /home/surender/mongo-backup.sh >> /home/surender/data/mongo/logs/backup.log 2>&1

It is imperative that we use >> instead of > otherwise at each scheduled run the script will overwrite the previous logs in the file instead of appending to the existing logs.

Restoring from scheduled backups

To restore from any of the available backup tarballs, simply run the following command with the tarball's name as argument which is in yyyy-mm-dd format.

./home/surender/mongo-restore.sh <yyyy-mm-dd>