Programster's Blog

Tutorials focusing on Linux, programming, and open source


Rsync is a great tool for making file backups/copies or for transferring files to a remote server. This is because it will only transfer the data that has changed, and it can even compress the data before sending it, which is great if bandwidth is an issue (which it usually is).

Local Backup

Below is the command I use for taking a 'backup' of my 3TB Drive. Please note that having the end "/" is actually fairly important when specifying the source and destination folders. If you don't like it, you may want to use How-To Geek's alternative.

rsync \
--recursive \
--verbose \
--human-readable \
--progress \
--links \
--copy-unsafe-links \
--owner \
--group \
--perms \
--times \
--force \
--delete \
--delete-before \
--ignore-errors \
--sparse \
--info=progress2 \
--dry-run \
/path/to/source/folder/ /path/to/dest/folder/.

You will need to remove the --dry-run line in order for it to actually transfer/delete files. That switch lets you know what it is going to do before it goes ahead which has saved me many times.

You may need to convert absolute symlinks to be relative in order to prevent making unnecessary duplicates on the destination.

Switches Used

  • --delete - ensures that any files in the remote directory that aren't in the soruce directory are deleted.
  • --times - is fairly important if you are going to be running this script over and over again, as it will keep the times between the two files in sync (there wont be a difference which may cause a transfer next time).
  • --force - allows the deletion of a non empty directory in order to be replaced by an empty directory.
  • --delete-before - should be the default behaviour and is thus unnecessary, but allows you to be certain that deletions will happen before a transfer.
  • --copy-unsafe-links - results in symlinks that are considered unsafe to have the file it relates to copied across rather than the symlink. Symlinks are considered unsafe if they have too many ".." that they go out of scope of the copy, or if they start with "/" whcih means they use an absolute path rather than a relative one.
  • --links - copies symlinks as symlinks (except unsafe ones are converted to their representative files because of the earlier copy-unsafe-links option). Without this, symlinks would just be skipped, and only those symliks that pointed outside of our rsync path would be copied across.
  • --progress2 - results in showing the overall progress of the entire transfer, not just of the single file that is being copied.
  • --dry-run - will result in rsync not actually performing any deletions or transfers but telling you what it will do. This is why it is commented out, but left in so that you can easily enable it. I highly recommend you enable it the first time you run the command after writing it.
  • --owner - ensures the that owner user of the file is kept (not the permission level for the owner).
  • --group - ensures the that group user of the file is kept (not the permission level for the group).
  • --sparse - ensures that sparse image files (e.g. for virtual machines) are tranferred efficiently (don't turn into non-sparse "disk-eaters")

Other Backup Solutions

Rsync is generally not the "best" backup solution. Use of the duplicity tool in Linux is usually better as it offers incremental backups and encryption whereas this is more like a single copy. I use this solution instead for speed and the fact that I do not have the space for incremental backups (copying an almost full 3TB drive to another 3TB drive).

Updating Remote Webserver

Updating a remote webserver is much the same, except now, you probably want to use compression and need to specify the ssh credentails.

rsync \
--recursive \
--times \
--verbose \
--info=progress2 \
--compress \
--force \
--delete \
-e "ssh -p12345" --exclude '.svn' /my/source/ root@

The -e switch allows the transfer over ssh. I have also taken the liberty of providing an example where I specify a the ssh port (when you need to do when it is not set to the default of 22). I also exclude all .svn folders in this example, although you may want to replace it with ".git" if you use that instead.

The delete switch is ensures that any files in the remote directory that aren't in the soruce directory are deleted. This can be dangerous if your webserver "collects" files from users (such as an image uploader), in which case you may want to remove this option, or use the excludes switch.

I have provided an example/fake IP address which you will need to change, and if your a good Linux user, you will not be using the "root" as I have in this example