Rsync
Rsync is a great tool for making file backups/copies or for transferring files to a remote server. This is because it will only transfer the data that has changed, and it can even compress the data before sending it, which is great if bandwidth is an issue (which it usually is).
Related Posts
Examples
Creating A Local Backup
Below is the command I use for taking a 'backup' of my 3TB Drive. Please note that having the end "/" is actually fairly important when specifying the source and destination folders. If you don't like it, you may want to use How-To Geek's alternative.
#!/bin/bash
sudo rsync \
--recursive \
--verbose \
--human-readable \
--progress \
--links \
--copy-unsafe-links \
--owner \
--group \
--perms \
--times \
--force \
--delete \
--delete-before \
--ignore-errors \
--sparse \
--info=progress2 \
--dry-run \
/path/to/source/folder-to-copy/ \
/path/to/dest/parent-folder/.
/
like in the example above, then the content of that folder is
placed in the destination specified. If you do not end with a /
, then the folder specified
will be destination path (one more nested layer). Alternatively, strip out the /
and /.
on
the end of both the source and dest, to keep the two specified folders in sync.
Switches Used
--human-readable
- file sizes are output in human readable format. E.g. 1KB instead of 1024 bytes.--delete
- ensures that any files in the remote directory that aren't in the soruce directory are deleted.--times
- is fairly important if you are going to be running this script over and over again, as it will keep the times between the two files in sync (there wont be a difference which may cause a transfer next time).--force
- allows the deletion of a non empty directory in order to be replaced by an empty directory.--delete-before
- should be the default behaviour and is thus unnecessary, but allows you to be certain that deletions will happen before a transfer.--copy-unsafe-links
- results in symlinks that are considered unsafe to have the file it relates to copied across rather than the symlink. Symlinks are considered unsafe if they have too many ".." that they go out of scope of the copy, or if they start with "/" whcih means they use an absolute path rather than a relative one.--links
- copies symlinks as symlinks (except unsafe ones are converted to their representative files because of the earlier copy-unsafe-links option). Without this, symlinks would just be skipped, and only those symliks that pointed outside of our rsync path would be copied across.--progress2
- results in showing the overall progress of the entire transfer, not just of the single file that is being copied.--dry-run
- will result in rsync not actually performing any deletions or transfers but telling you what it will do. This is why it is commented out, but left in so that you can easily enable it. I highly recommend you enable it the first time you run the command after writing it.--owner
- ensures the that owner user of the file is kept (not the permission level for the owner). This requires sudo/root.--group
- ensures the that group user of the file is kept (not the permission level for the group).--sparse
- ensures that sparse image files (e.g. for virtual machines) are tranferred efficiently (don't turn into non-sparse "disk-eaters")
Other Backup Solutions
Rsync is generally not the "best" backup solution. Use of the duplicity tool in Linux is usually better as it offers incremental backups and encryption whereas this is more like a single copy. I use this solution instead for speed and the fact that I do not have the space for incremental backups (copying an almost full 3TB drive to another 3TB drive).
Updating Remote Webserver Over SSH
Updating a remote webserver is much the same, except now, you probably want to use compression and need to specify the ssh credentails.
sudo rsync \
--recursive \
--times \
--verbose \
--info=progress2 \
--compress \
--force \
--delete \
--exclude '.svn' \
--exclude '.git' \
-e "ssh -p12345" \
/my/source/ \
mySshUser@192.168.1.1:/dest/directory/
The -e
switch tells rsync which remote shell to use, which allows us to perform the transfer over
SSH. I have also taken the liberty of providing an example where I specify a the ssh port (when
you need to do when it is not set to the default of 22). I also exclude all .svn folders in
this example, although you may want to replace it with ".git" if you use that instead.
--delete
switch ensures that any files in the remote directory that aren't in
the source directory are deleted. This can be dangerous if your webserver "collects" files from
users (such as an image uploader), in which case you may want to remove this option, or use the
--exclude
switch.
Simplify Your Life With SSH Config
You can simplify your life by simply configuring the remote host in your $HOME/.ssh/config
file.
The example below configures a remote host at IP 1.2.3.4
that can be accessed by the ubuntu
user on port 2323
by using the SSH key at /home/programster/.ssh/myBackupServerKey.pem
.
Host my-web-server
HostName 1.2.3.4
User ubuntu
Port 2323
IdentityFile /home/programster/.ssh/myBackupServerKey.pem
For this, you could now run a much simpler rsync command like so:
rsync \
--recursive \
--times \
--verbose \
--info=progress2 \
--compress \
--force \
--delete \
--exclude '.svn' \
--exclude '.git' \
-e "ssh -p12345" \
/my/source/ \
my-web-server:/dest/directory/
my-web-server
as it is just a human rememberable alias. It does
not need to be a legitimate hostname/domain name.
Backup Sparse KVM Disk Images To Remote Host
If using KVM virtual machines, then you are probably using
sparse qcow2 disk images.
These disk image files take up less space than they report, but need to be handled by a program that
understand this, otherwise the file will utilize the full amount of space when transferred. Luckily,
the rsync tool has --sparse
flag which we can use. Also, they are disk images, we want to use --inplace
with --no-whole-file
for existing disk images, so we only need to transfer changed bits, rather than the entire file. If you are confused as to
why you need both --inplace
and --no-whole-file
, please refer to
this superuser post.
# Sync existing files inplace
rsync \
--existing \
--inplace \
--no-whole-file \
--recursive \
--verbose \
--info=progress2 \
--times \
--force \
-e "ssh" \
/path/to/local/file/ \
user@host:/path/to/backup/location/
# Sync all new qcow2 files
rsync \
--ignore-existing \
--sparse \
--recursive \
--verbose \
--info=progress2 \
--times \
--force \
-e "ssh" \
/path/to/local/files/ \
user@host:/path/to/backup/location/
Source And Dest Confusion
Unfortunately, when using -e "ssh"
one needs to be very careful about how one specifies the
local and remote paths. E.g. if one ends a path with /*
, /
, or /.
. It may not behave as you
might expect, and it is strange that this does not happen for me when I tested with performing
local rsync operations.
This is especially important when one is making use of the --delete
flag as one does not want
to accidentally delete files on the remote host, so here is my cheatsheet of source/dest combos.
Most Useful Way
Use the following if you wish to copy the contents of /local/path/to/folder
(including
hidden files), to /path/to/remote/folder
, deleting any files in /path/to/remote/folder
, that
are not within the local one.
/local/path/to/folder/ \
/path/to/remote/folder/
This also seems to have the same behaviour as
/local/path/to/folder/ \
/path/to/remote/folder
... and also:
/local/path/to/folder \
/path/to/remote/.
.
in paths now.
Also, this version doesn't allow you to choose a different name for the folder on the remote host.
Mistake 1 - Sub-sub-folder
If you do the following:
/local/path/to/folder \
/path/to/remote/folder
... then you will end up with a folder called folder
within /path/to/remote/folder
. E.g.
/path/to/remote/folder/folder
However, you will not accidentally delete remote files, they just won't be where you probably
wanted them. Just make sure to always end folders with /
as a rule of thumb to prevent this.
Mistake 2 - Ignoring Hidden Files
If you do the following:
/local/path/to/folder/* \
/path/to/remote/folder
... then this will mostly work. However, any hidden files within the top level of the source folder will not be copied across. Likewise, if there are any hidden files in the remote folder, then these will be ignored and not be deleted, even if they should be.
However, you will not accidentally delete remote files, they just won't be where you probably wanted them.
Mistake 3 - Deletes Files You Didn't Want To
If you do the following in an attempt to copy folder
to /path/to/remote/folder
/local/path/to/folder/ \
/path/to/remote/.
This will end up copy across the files within /local/path/to/folder/
, to /path/to/remote/
rather than the folder itself. This will likely delete files that you didn't want removing from
/path/to/remote/
which probably held other folders/data as well
References
- rsync(1) - Linux man page
- Tecmint - 10 Practical Examples of Rsync
- nabble.com - symlink behaviour?
- Serverfault - Incremental backup qcow2 image with rsync
First published: 16th August 2018