Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Rsync

Rsync is a great tool for making file backups/copies or for transferring files to a remote server. This is because it will only transfer the data that has changed, and it can even compress the data before sending it, which is great if bandwidth is an issue (which it usually is).

Related Posts

Examples

Creating A Local Backup

Below is the command I use for taking a 'backup' of my 3TB Drive. Please note that having the end "/" is actually fairly important when specifying the source and destination folders. If you don't like it, you may want to use How-To Geek's alternative.

#!/bin/bash
sudo rsync \
  --recursive \
  --verbose \
  --human-readable \
  --progress \
  --links \
  --copy-unsafe-links \
  --owner \
  --group \
  --perms \
  --times \
  --force \
  --delete \
  --delete-before \
  --ignore-errors \
  --sparse \
  --info=progress2 \
  --dry-run \
  /path/to/source/folder-to-copy/ \
  /path/to/dest/parent-folder/.

You will need to remove the --dry-run line in order for it to actually transfer/delete files. That switch lets you know what it is going to do before it goes ahead which has saved me many times.

You may need to convert absolute symlinks to be relative in order to prevent making unnecessary duplicates on the destination.

If the source ends in / like in the example above, then the content of that folder is placed in the destination specified. If you do not end with a /, then the folder specified will be destination path (one more nested layer). Alternatively, strip out the / and /. on the end of both the source and dest, to keep the two specified folders in sync.

Switches Used

  • --human-readable - file sizes are output in human readable format. E.g. 1KB instead of 1024 bytes.
  • --delete - ensures that any files in the remote directory that aren't in the soruce directory are deleted.
  • --times - is fairly important if you are going to be running this script over and over again, as it will keep the times between the two files in sync (there wont be a difference which may cause a transfer next time).
  • --force - allows the deletion of a non empty directory in order to be replaced by an empty directory.
  • --delete-before - should be the default behaviour and is thus unnecessary, but allows you to be certain that deletions will happen before a transfer.
  • --copy-unsafe-links - results in symlinks that are considered unsafe to have the file it relates to copied across rather than the symlink. Symlinks are considered unsafe if they have too many ".." that they go out of scope of the copy, or if they start with "/" whcih means they use an absolute path rather than a relative one.
  • --links - copies symlinks as symlinks (except unsafe ones are converted to their representative files because of the earlier copy-unsafe-links option). Without this, symlinks would just be skipped, and only those symliks that pointed outside of our rsync path would be copied across.
  • --progress2 - results in showing the overall progress of the entire transfer, not just of the single file that is being copied.
  • --dry-run - will result in rsync not actually performing any deletions or transfers but telling you what it will do. This is why it is commented out, but left in so that you can easily enable it. I highly recommend you enable it the first time you run the command after writing it.
  • --owner - ensures the that owner user of the file is kept (not the permission level for the owner). This requires sudo/root.
  • --group - ensures the that group user of the file is kept (not the permission level for the group).
  • --sparse - ensures that sparse image files (e.g. for virtual machines) are tranferred efficiently (don't turn into non-sparse "disk-eaters")

Other Backup Solutions

Rsync is generally not the "best" backup solution. Use of the duplicity tool in Linux is usually better as it offers incremental backups and encryption whereas this is more like a single copy. I use this solution instead for speed and the fact that I do not have the space for incremental backups (copying an almost full 3TB drive to another 3TB drive).

Updating Remote Webserver Over SSH

Updating a remote webserver is much the same, except now, you probably want to use compression and need to specify the ssh credentails.

sudo rsync \
  --recursive \
  --times \
  --verbose \
  --info=progress2 \
  --compress \
  --force \
  --delete \
  --exclude '.svn' \
  --exclude '.git' \
  -e "ssh -p12345"  \
  /my/source/ \
  mySshUser@192.168.1.1:/dest/directory/

The -e switch tells rsync which remote shell to use, which allows us to perform the transfer over SSH. I have also taken the liberty of providing an example where I specify a the ssh port (when you need to do when it is not set to the default of 22). I also exclude all .svn folders in this example, although you may want to replace it with ".git" if you use that instead.

The --delete switch ensures that any files in the remote directory that aren't in the source directory are deleted. This can be dangerous if your webserver "collects" files from users (such as an image uploader), in which case you may want to remove this option, or use the --exclude switch.

Simplify Your Life With SSH Config

You can simplify your life by simply configuring the remote host in your $HOME/.ssh/config file. The example below configures a remote host at IP 1.2.3.4 that can be accessed by the ubuntu user on port 2323 by using the SSH key at /home/programster/.ssh/myBackupServerKey.pem.

Host my-web-server
    HostName 1.2.3.4
    User ubuntu
    Port 2323
    IdentityFile /home/programster/.ssh/myBackupServerKey.pem

For this, you could now run a much simpler rsync command like so:

rsync \
  --recursive \
  --times \
  --verbose \
  --info=progress2 \
  --compress \
  --force \
  --delete \
  --exclude '.svn' \
  --exclude '.git' \
  -e "ssh -p12345"  \
  /my/source/ \
  my-web-server:/dest/directory/

You could literally use my-web-server as it is just a human rememberable alias. It does not need to be a legitimate hostname/domain name.

Backup Sparse KVM Disk Images To Remote Host

If using KVM virtual machines, then you are probably using sparse qcow2 disk images. These disk image files take up less space than they report, but need to be handled by a program that understand this, otherwise the file will utilize the full amount of space when transferred. Luckily, the rsync tool has --sparse flag which we can use. Also, they are disk images, we want to use --inplace with --no-whole-file for existing disk images, so we only need to transfer changed bits, rather than the entire file. If you are confused as to why you need both --inplace and --no-whole-file, please refer to this superuser post.

# Sync existing files inplace
rsync \
  --existing \
  --inplace \
  --no-whole-file \
  --recursive \
  --verbose \
  --info=progress2 \
  --times \
  --force \
  -e "ssh"  \
  /path/to/local/file/ \
  user@host:/path/to/backup/location/

# Sync all new qcow2 files
rsync \
  --ignore-existing \
  --sparse \
  --recursive \
  --verbose \
  --info=progress2 \
  --times \
  --force \
  -e "ssh"  \
  /path/to/local/files/ \
  user@host:/path/to/backup/location/

Source And Dest Confusion

Unfortunately, when using -e "ssh" one needs to be very careful about how one specifies the local and remote paths. E.g. if one ends a path with /*, /, or /.. It may not behave as you might expect, and it is strange that this does not happen for me when I tested with performing local rsync operations.

This is especially important when one is making use of the --delete flag as one does not want to accidentally delete files on the remote host, so here is my cheatsheet of source/dest combos.

Most Useful Way

Use the following if you wish to copy the contents of /local/path/to/folder (including hidden files), to /path/to/remote/folder, deleting any files in /path/to/remote/folder, that are not within the local one.

/local/path/to/folder/ \
/path/to/remote/folder/

This also seems to have the same behaviour as

/local/path/to/folder/ \
/path/to/remote/folder

... and also:

/local/path/to/folder \
/path/to/remote/.

This version is close to mistake 3, so I avoid using . in paths now. Also, this version doesn't allow you to choose a different name for the folder on the remote host.

Mistake 1 - Sub-sub-folder

If you do the following:

/local/path/to/folder \
/path/to/remote/folder

... then you will end up with a folder called folder within /path/to/remote/folder. E.g. /path/to/remote/folder/folder

However, you will not accidentally delete remote files, they just won't be where you probably wanted them. Just make sure to always end folders with / as a rule of thumb to prevent this.

Mistake 2 - Ignoring Hidden Files

If you do the following:

/local/path/to/folder/* \
/path/to/remote/folder

... then this will mostly work. However, any hidden files within the top level of the source folder will not be copied across. Likewise, if there are any hidden files in the remote folder, then these will be ignored and not be deleted, even if they should be.

However, you will not accidentally delete remote files, they just won't be where you probably wanted them.

Mistake 3 - Deletes Files You Didn't Want To

If you do the following in an attempt to copy folder to /path/to/remote/folder

/local/path/to/folder/ \
/path/to/remote/.

This will end up copy across the files within /local/path/to/folder/, to /path/to/remote/ rather than the folder itself. This will likely delete files that you didn't want removing from /path/to/remote/ which probably held other folders/data as well

References

Last updated: 13th September 2023
First published: 16th August 2018