Programster's Blog

Tutorials focusing on Linux, programming, and open source

Seafile Garbage Collection

It turns out that you may need to manually run a garbage collection process every once in a while on your Seafile server. This is because there may be unused "blocks" of storage that are no longer being used/referenced.

Why Are There Unused Blocks Lying Around?

Unused blocks can be created whenever you delete files or libraries, but also depend on your history settings and data duplication. E.g. do you keep your history for 30 days or infinity? Did both you and another person upload the same file to different places? It is because of the history and de-duplication nature of Seafile, that it cannot just delete blocks immediately whenever you remove files/libraries. Manually running the garbage collection process has the server look through your blocks and check whether they are still being used. If not, they can be purged. This process can take a while depending on your hardware setup and usage.

Steps

Turn off the Seafile server processes (not the physical server). This is because new blocks written into Seafile while GC is running may be mistakenly deleted by the GC program.

/bin/bash $HOME/seafile/seafile-server-latest/seafile.sh stop
/bin/bash $HOME/seafile/seafile-server-latest/seahub.sh stop

You should see:

Stopping seafile server ...
Done.

Stopping seahub ...
Done.

Run the garbage collection script:

cd $HOME/seafile/seafile-server-latest
./seaf-gc.sh

This may take a while, go make a cup of tea.

Whilst it's working away, you will see blocks of text similar below:

[09/30/17 04:15:16] gc-core.c(440): GC version 1 repo Videos(0dfcba8e-775b-493e-aa3c-ca421535d54c)
[09/30/17 04:15:16] gc-core.c(313): GC started. Total block number is 12261.
[09/30/17 04:15:16] gc-core.c(46): GC index size is 6130 Byte.
[09/30/17 04:15:16] gc-core.c(327): Populating index.
[09/30/17 04:15:16] gc-core.c(181): Populating index for repo 0dfcba8e.
[09/30/17 04:15:17] gc-core.c(234): Traversed 6 commits, 2843 blocks.
[09/30/17 04:15:17] gc-core.c(341): Scanning and deleting unused blocks.
[09/30/17 04:15:17] gc-core.c(364): GC finished. 12261 blocks total, about 2843 reachable blocks, 9368 blocks are removed.

It will finish with a list of repos deleted by users:

[09/30/17 04:15:17] gc-core.c(384): === Repos deleted by users ===
[09/30/17 04:15:17] gc-core.c(392): GC deleted repo eb0ebd0a.
[09/30/17 04:15:22] gc-core.c(392): GC deleted repo de080b85.
[09/30/17 04:15:22] gc-core.c(392): GC deleted repo db5e5bf2.
[09/30/17 04:15:23] gc-core.c(392): GC deleted repo d154a3ae.
[09/30/17 04:15:23] gc-core.c(392): GC deleted repo a8c05d26.
[09/30/17 04:15:23] gc-core.c(392): GC deleted repo a6110c27.
[09/30/17 04:15:23] gc-core.c(392): GC deleted repo 8c729d35.
[09/30/17 04:15:57] gc-core.c(392): GC deleted repo 86f10628.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 7c450942.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 799ff6ea.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 6129c6fe.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 4dbaab0f.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 2b037e4b.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 237d64d7.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 1bc1dd82.
[09/30/17 04:15:58] gc-core.c(392): GC deleted repo 05661eef.
[09/30/17 04:15:59] gc-core.c(456): === GC is finished ===
seafserv-gc run done

Done.

Start Seafile

Don't forget to start Seafile up again after completion.

/bin/bash $HOME/seafile/seafile-server-latest/seafile.sh start
/bin/bash $HOME/seafile/seafile-server-latest/seahub.sh start

Extra Info

I had never run the garbage collection process on my personal server until today, so I started with 506GB and ended up with just 82. A saving of 84%! However, just before running garbage collection, I did go through my libraries and change the history duration from infinity to 30 days.

[ pydf output before running garbage collection ]

[ pydf output after running garbage collection ]

References