GlusterFS - Distributed Volumes With Different Size Bricks
Since the 3.6 release of GlusterFS, it has apparently supported bricks of heterogeneous (different) sizes.
I was keen to test this out, because it resolves the biggest reason I had for not using GlusterFS.
For this tutorial, I am going to be setting up a GlusterFS cluster on Debian 8, using Virtualbox machines with different sized virtual disks. I will also be configuring the cluster to use a distributed volume without any replication or striping. If I was to use this for my home storage cluster, then I would be sure to use ZFS RAIDz2 arrays to form the bricks that go into the volume to give me disk redundancy. If a node was lost due to a network issue, I would only lose access to the files on that node, and not the entire cluster because I am not using striping. Recovering the data should be as easy as copying the files off the disconnected node.
Steps
Follow my tutorial on "Deploying a GlusterFS Cluster" (making sure to read the note to install version 3.8) until you get to the section titled: "Create a Storage Volume".
At this point, create a disk volume for each of your VMs, making sure to use different sizes. To keep things simple, I am going to create a 1GB, 2GB, and 3GB volume for gluster1, gluster2, and gluster3 VMs respectively.
On each node perform the following commands:
VOLUME_DIR_PATH="/gluster/volume1"
sudo mkdir -p $VOLUME_DIR_PATH
sudo mkfs -t ext4 /dev/sdb
sudo mount /dev/sdb $VOLUME_DIR_PATH
These commands:
- create a folder at
/gluster/volume1
- create an ext4 filesystem on your disk volume (assuming it's
sdb
) - mount your disk volume at this newly created location
Create The Distributed Volume
Create a distributed volume by performing the following commands on any single one of the nodes.
# Settings
VOLUME_NAME="volume1"
VOLUME_DIR_PATH="/gluster/volume1"
HOST1="gluster1.programster.org"
HOST2="gluster2.programster.org"
HOST3="gluster3.programster.org"
sudo gluster volume create \
$VOLUME_NAME \
transport tcp \
$HOST1:$VOLUME_DIR_PATH \
$HOST2:$VOLUME_DIR_PATH \
$HOST3:$VOLUME_DIR_PATH \
force
replica $NUMBER_OF_SERVERS \
If you were successful you should see the following output:
volume create: volume1: success: please start the volume to access data
Start the Volume
Now we need to start the volume like we were just told.
sudo gluster volume start volume1
Mount the Gluster Volume
Follow the steps in Setting up a GlusterFS Client to mount your gluster volume on a server. Since my deployment consists of a 1GB, 2 GB, and a 3GB disk combo, I ended up with a df -h
output of:
gluster1.programster.org:volume1 5.8G 9.2M 5.4G 1% /home/stuart/gluster-mount
You can now write files to this mount point and the files will be distributed across your GlusterFS cluster.
Testing
Hash Based Distribution
It looks like Gluster decides where to store a file based on the hash of the filename. This means that with my tiny storage pool of 6GB, and with large test files of 500MB, one of my servers filled up way before the others did, resulting in me sometimes failing to write more files (depending on the name). It looks like the likelihood of writing a file to a certain brick is dependent on how large that brick is, so with large enough bricks distribution should be fairly evenly spread. Using a hash of the filename instead of the content means the file isn't likely to try to jump to a different server every time you make a tiny change to it.
Everything is Kept As Files
The best thing about this setup is that the files are kept as files in the pool. This means if a server disconnects or you run into any problems, you can easily recover the original files from on the server bricks themselves. Other technologies will break the files into chunks which then get spread across the servers, which has other benefits, but does make it slightly harder to recover when something goes wrong.
Appendix
Volume Types
There are many different types of volumes you can configure depending on your needs. The content below is taken directly from the docs here and is only here for reference.
Distributed - Distributed volumes distributes files throughout the bricks in the volume. You can use distributed volumes where the requirement is to scale storage and the redundancy is either not important or is provided by other hardware/software layers.
Replicated – Replicated volumes replicates files across bricks in the volume. You can use replicated volumes in environments where high-availability and high-reliability are critical.
Striped – Striped volumes stripes data across bricks in the volume. For best results, you should use striped volumes only in high concurrency environments accessing very large files.
Distributed Striped - Distributed striped volumes stripe data across two or more nodes in the cluster. You should use distributed striped volumes where the requirement is to scale storage and in high concurrency environments accessing very large files is critical.
Distributed Replicated - Distributed replicated volumes distributes files across replicated bricks in the volume. You can use distributed replicated volumes in environments where the requirement is to scale storage and high-reliability is critical. Distributed replicated volumes also offer improved read performance in most environments.
Distributed Striped Replicated – Distributed striped replicated volumes distributes striped data across replicated bricks in the cluster. For best results, you should use distributed striped replicated volumes in highly concurrent environments where parallel access of very large files and performance is critical. In this release, configuration of this volume type is supported only for Map Reduce workloads.
Striped Replicated – Striped replicated volumes stripes data across replicated bricks in the cluster. For best results, you should use striped replicated volumes in highly concurrent environments where there is parallel access of very large files and performance is critical. In this release, configuration of this volume type is supported only for Map Reduce workloads.
Dispersed - Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures. It stores an encoded fragment of the original file to each brick in a way that only a subset of the fragments is needed to recover the original file. The number of bricks that can be missing without losing access to data is configured by the administrator on volume creation time.
Distributed Dispersed - Distributed dispersed volumes distribute files across dispersed subvolumes. This has the same advantages of distribute replicate volumes, but using disperse to store the data into the bricks.
First published: 16th August 2018