Programster's Blog

Tutorials focusing on Linux, programming, and open-source

Allan Jude Interview with Wendell - ZFS Talk and More

Below is an interesting talk between Alan Jude and Wendell from Level1Techs about the ZFS filesystem. No matter what level you are at with ZFS, from a novice thinking about using it for the first time, to a professional who has used it for years, I can almost guarantee there is something in here for you. It starts with talking about why you should use ZFS over other filesystems, and moves onto new features coming into ZFS, and tips about how you can improve performance, such as by setting the record size appropriately, and using multiple datasets.

  • 00:00 Introduction
  • 00:45 - Why use ZFS?
    • 01:34 0 overwriting filesystems compared to copy on write.
    • 02:06 snapshots
    • 02:17 - handling sudden power loss and shorn writes.
    • 03:49 - Features that are "enterprisey" that are built in (compression, etc)
  • [04:42]( - the point was to make storage administration easy - adding storage should be as easy as adding RAM.
  • 05:06 pooled storage
  • 07:35 zfs has grown before a filesystem, its also a volume manager (managing multiple disks).
  • 08:33 - checksums and the uber block
  • 08:53 - handling power crash dont have to wait for an fsck.
  • 09:27 - syncronizing writes across drives in large array.
  • 10:00 - drives and RAID controllers reporting that data has been written.
  • 11:16 - all storage that you add to the pool, must have its redundancy in there already.
  • 12:16 - write balancing and recent changes. Whichever vdev is done first gets the next piece but because the fuller a disk is the slower it gets, they should balance out.
  • 13:33 - does fragmentation become a problem.
  • 14:28 - if you are running a database, are there things you can do to tune the pool/dataset?
    • 14:41 - the most important thing is to set the record size of the dataset match what the database is going to do. InnoDB is 16k. ZFS defaults to 128k.
  • 15:41 - understanding how filesystem works can really help with the performance of your application.
  • 16:00 - zfs and dtrace - observe what is actually happening.
  • 17:25 - wendell promises to do a video on dtrace.
  • 17:40 - debugging a ZFS performance problem. Use approrpriate record size and penalty not doing so can have and TRIM issues.
  • 19:20 - trims are supposed to happen when all queues are empty and there is a limit on how many trim operations can be queued up.
  • 19:48 - ZFS (remote) replication - compared against rsync.
  • 21:55 - deduplication.
    • 22:15 - avoid deduplication for now.
    • 22:33 - metadata classes - use a special/specific device to help with deduplication, such as an NVME drive.
  • 24:50 compression
    • 28:12 - new compression "zstandard" from facebook.
    • 29:48 - switching compresssion on on things like databases can often spped up performance.
    • 30:55 - compressed arc and the LRU (Least recently used) cache.
    • 31:44 - ZFS has 4 lists instead instead so that doing somethign like a backup doesn't screw your cache.
    • 32:50 - compressed ARC.
    • 33:40 another cache that is stuff decompressed this second (good for databases with indexes).
    • 34:30 - can now send the compressed version of data in the replication stream to save you bandwidth and not waste time decompressing.
    • 37:18 - boot environments/snapshots - and the value of having different datasets.
  • 39:00 - break up your data into datasets!
  • 40:36 - comparing against BTRFS.
    • 41:04 - over 100 engineer years went into ZFS which has made it a lot more polished than a lot of the newer competitors.
    • 41:35 - there are still people working on ZFS in the form of openZFS (forked from the last opensource version.)
    • 42:18 - hardware is catching up to allow ZFS features, such as intel optane.
    • 43:00 - the power required to power all the HDDS necessary to reach the ZFS storage limit of 128 zetabytes would boil all the water.
    • 43:50 - its all about reliability.
      • 44:11 - embarrassing RAID 5 "thing" with BTRFS and how it was found.
  • 45:58 - book references -
  • 46:08 - weekly podcast ( about BSD family of operating systems.
  • 47:20 - mirrors are your best bet for IOPS compared to RAIDz
  • 47:40 - There is a big difference between using NVME and hdd in ZFS.
  • 48:10 - s-log, guessing never using more than 8 gigs, and can tune float time.
  • 49:39 - if you are using bittorrent which also uses 16k chunks. Use two different datasets for incomplete and completed torrents to prevent fragmentation.
Last updated: 6th July 2019
First published: 16th August 2018