Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Backing Up S3 Bucket With Duplicity

The following guide will show you how to create iterative backups of your S3 bucket to another S3 bucket, in such a way that you don't need to be able to hold all of the contents of either bucket locally. We achieve this by making use of S3FS to mount the source bucket, and backing up straight to another S3 bucket.

You probably want to run this from an EC2 server, since you will benefit from the free internal networking which is likely to be much faster.

This tutorial assumes you are running Ubuntu 20.04, but the steps are likely to be similar for other Linux distros.

Steps

First we need to mount the bucket we wish to backup, using s3fs, so that we can treat it like a local filesystem.

# Fill in settings here, the AWS IAM key neeeds to have permission to access the bucket you wish to backup (source)
ACCESS_KEY_ID=xxxxxxxxxxxxx
SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxx
BUCKET_NAME=dev-bae-bit-portal-bucket
S3_MOUNT_POINT=/mnt/s3-bucket

# Install the S3FS package
sudo apt update && sudo apt install s3fs -y


# Create your credentials file that we will use later to mount the filesystem
echo $ACCESS_KEY_ID:$SECRET_ACCESS_KEY > ${HOME}/.passwd-s3fs
chmod 600 ${HOME}/.passwd-s3fs

# Create the mount point
mkdir -p $S3_MOUNT_POINT

# Finally, mount the S3 bucket to the mount point.
s3fs \
  $BUCKET_NAME \
  $S3_MOUNT_POINT \
  -o passwd_file=${HOME}/.passwd-s3fs

Create Backups

Install duplicity if you haven't already

sudo apt-get install duplicity -y

Wel also need to install "boto" through pip for duplicity to work with S3.

sudo apt-get install python3-pip -y
pip3 install boto

Run the following script to create the backup

#!/bin/bash

# Credentials for the S3 Bucket you wish to backup to.
AWS_ACCESS_KEY_ID=xxxxxxxxxxxxxxxxxxxxxx
AWS_SECRET_ACCESS_KEY=xxxxxxxxxxxxxxxxxxxxxxx
BUCKET_REGION=s3-eu-west-2

DAYS_BETWEEN_FULL_BACKUPS=30
BACKUP_LIFETIME_DAYS=400
BACKUP_BUCKET_NAME="dev-bae-bit-portal-backups"
LOCAL_FOLDER_TO_BACKUP=/mnt/s3-bucket


export $AWS_ACCESS_KEY_ID
export $AWS_SECRET_ACCESS_KEY

# Cleanup if any previous backup had issues.
/usr/bin/duplicity cleanup \
  --force \
  --no-encryption \
  "s3://${BUCKET_REGION}.amazonaws.com/${BACKUP_BUCKET_NAME}/"

# Remove outdated backups
/usr/bin/duplicity remove-older-than \
  --force \
  --no-encryption \
  ${BACKUP_LIFETIME_DAYS}D \
  "s3://${BUCKET_REGION}.amazonaws.com/${BACKUP_BUCKET_NAME}/"

# Take the backup.
/usr/bin/duplicity \
  --no-encryption \
  --s3-european-buckets \
  --s3-use-new-style \
  --verbosity 4 \
  --full-if-older-than ${DAYS_BETWEEN_FULL_BACKUPS}D \
  $LOCAL_FOLDER_TO_BACKUP \
  "s3://${BUCKET_REGION}.amazonaws.com/${BACKUP_BUCKET_NAME}/"

Docker Containers

If you are doing this within a docker container, you need to run the container with --cap-add SYS_ADMIN --device /dev/fuse, or you can just run with --privileged, but there are a lot of security implications if you run with privileged mode. If using docker-compose, you can specify privileged with

my_service:
  privileged: true

I have yet to find out how to run --cap-add SYS_ADMIN --device /dev/fuse with docker-compose`

You also need to set the timezone automatically in your Dockerfile, as installing the s3fs package will trigger the setting of the timezone.

Finally, you will need to add --allow-source-mismatch to the backup command because the docker container's hostname will change. E.g.

/usr/bin/duplicity \
  --allow-source-mismatch \
  --no-encryption \
  --s3-european-buckets \
  --s3-use-new-style \
  --verbosity 4 \
  --full-if-older-than ${DAYS_BETWEEN_FULL_BACKUPS}D \
  $LOCAL_FOLDER_TO_BACKUP \
  "s3://${BUCKET_REGION}.amazonaws.com/${BACKUP_BUCKET_NAME}/"

References

Last updated: 21st August 2020
First published: 21st August 2020