Show HN: Compressed directories on Linux

shmerl · on May 8, 2017

Is there a way to mount loop device without sudo? Some kind of FUSE option?

It's useful as a workaround for 32-bit applications which aren't built with LFS (Large File Support) enabled, and because of that fail on large XFS partitions. It's a very common bug in games which come out in 32-bit only. Making a loop device is a workaround, but using sudo for it all the time is annoying.

Also, there is probably no need to explicitly create a loop device, unless you already used up all existing ones (see man mount):

    if no explicit loop device is mentioned (but just an
    option `-o loop' is given), then mount will try to find 
    some unused loop device and use that, for example

          mount /tmp/disk.img /mnt -o loop

I.e. in such case I simply create an empty file of certain size, then format it with some filesystem, and then mount it as a loop.

For example:

    dd if=/dev/zero of="$image" bs=1024K count="$img_size"
    mkfs.xfs "$image"
    sudo mount -o loop "$image" "$mount_path"
    sudo chown $USER:$USER "$mount_path"

hashhar · on May 8, 2017

For iso images you can use fuseiso (https://wiki.archlinux.org/index.php/Fuseiso). While for all other purposes you really should take the time to learn about udisks. ArchWiki would be a great place to start.

shmerl · on May 8, 2017

Interesting, thanks. Control with Polkit makes it flexible.

barrkel · on May 8, 2017

Note that it's not recommended to use journalled filesystems on file-backed loopback devices: http://loop-aes.sourceforge.net/loop-AES.README

This approach is not a useful path to lower overall disk usage IMO because you need to thick provision the directory up front, with an estimate of overall compressed disk usage. That's a recipe for usability pain. Thin provisioning with sparse files won't work well with a COW tree structured backing store like btrfs either.

chungy · on May 8, 2017

btrfs isn't journalled, but all the same, the issue is data integrity in a crash scenario. That will affect btrfs, and frankly, any other filesystem (for example ext2 or ext4-without-journal); imagine if an application depended on syncing files at certain points and the underlying storage stack decided to write out a "sync2" before "sync1" was written out.

wmu · on May 8, 2017

What about data safety? Say, a hard drive got a bad sector, then a whole file is broken. This always stopped me from using system-wide compression. But maybe it's not as bad as I suppose.

hultner · on May 8, 2017

I've personally been running ZFS with LZ4, where checksums handle corruption. Another upside is that LZ4 is fast enough to actually improve I/O throughput, since overhead is negligible. On my jail/container/vm datasets I've seen space reduction upwards 80%, with backup-datasets I've seen around 30% reduction.

Another possibility is data deduplication, which is useful if you have plenty of RAM. You can deduplicate across files on block level. However I have only used it sporadically due to the high ram usage.

squarefoot · on May 8, 2017

In case of files primarily composed of redundant data, making 2 or more copies on the fly would still save a lot of space. Might be also useful for working in mixed SSD+HDD environments where less data would be written to the SSD and the HDD would hold copies of the files for safety.

zokier · on May 8, 2017

In theory you should get far better robustness by combined use of compression and FEC/ECC over storing content as-is. In practice this relies on the data being at least somewhat compressible, which might be an issue

dom0 · on May 8, 2017

FS compression does not compress the whole file in one block.

crypto5 · on May 8, 2017

Will this cause double-buffering (one buffer for host FS and another for BTRFS) with doubled RAM consumption?

loeg · on May 8, 2017

The host FS will only buffer compressed pages (so not double), but yeah.

doomrobo · on May 8, 2017

I like this article style a lot. A very cool result, presented clearly and step-by-step, in less than a full screen of text.

vortico · on May 8, 2017

Yes, he had a real world problem, discovered that there's not really a great solution in existence, made a new tool to solve it, and presented it in its entirety, all in a single page.

shmerl · on May 8, 2017

It's not really a new tool, but clever usage of existing tools.

im3w1l · on May 8, 2017

Only thing missing is how to put it in /etc/fstab

LeoPanthera · on May 8, 2017

If you're going to be using it regularly enough to want to put it in fstab, you should probably just allocate a regular partition for it.

Zardoz84 · on May 8, 2017

Or you can simply use directly BTRFS with LZO compression enabled.

Filligree · on May 7, 2017

Or you could use a base filesystem that supports compression, such as ZFS. shrug

flukus · on May 8, 2017

What does the calculation for file system compression look like these days? On one hand we've got faster SSDs, but on the other we've got more spare cores and SSD space is relatively expensive compared to HDD space.

hultner · on May 8, 2017

With LZ4 overhead is rather low, generally you'd get higher end to end throughput with LZ4 enabled. It's not as space efficient as more expensive algorithms but great for high performance scenarios.

polygot · on May 7, 2017

But if you just want one directory on an existing filesystem to be compressed, this might be an easier option

barrkel · on May 8, 2017

The article's suggestion means eating up 50G of allocation straight away. I don't think needing to estimate your compressed allocation up front is an easier solution at all. Normally with directories you use thin provisioning pulling from the whole device's free space; thick provisioning for a directory is not a path for easier reduce space consumption.

givemefive · on May 8, 2017

On top of each vdev (tank) you can make datasets (tank/logs) and mount those wherever

Each of these datasets can have their own options set. So yeah it's basically the same thing as btrfs?

funkaster · on May 8, 2017

You could specify a dataset that mounts to a specific (sub)directory. Or even have a file-backed pool just like the example.

nisa · on May 8, 2017

you could also do the trick with zfs - just create an zpool on that loop device - this also gives you lz4 and gzip

shmerl · on May 8, 2017

Yes, but ZFS is probably more memory hungry and somewhat an overkill just for compression alone. It's a pity simpler filesystems don't have transparent compression.

MichaelRenor · on May 8, 2017

I'm not sure it's an overkill compared to the presented solution.

keeperofdakeys · on May 8, 2017

One disadvantage of zfs in this case is that it uses its own caching layer, since it's basically an OS subsystem in itself. So on a memory constrained host (eg. rpi), you might have some issues with the zfs and linux buffers competing for RAM.

trentnelson · on May 8, 2017

Like NTFS has had for the last 20+ years? ;-)

One of the benefits of NT's packet-driven I/O model (versus the UNIX synchronous VFS read/write blocking buffer I/O model).

shmerl · on May 8, 2017

BTRFS is now good for it, but having it in other filesystems would be useful too. Linux now supports some async I/O as well.

fsiefken · on May 8, 2017

With ZFS and LZ4 or LZ4HC you might get better compression, BTRFS is more lightweight though, so it's a great solution. I wonder if encryption would be possible in a similar way.

pmlnr · on May 8, 2017

https://en.wikipedia.org/wiki/EncFS

jwilk · on May 8, 2017

From https://news.ycombinator.com/showhn.html :

Show HN is for something you've made that other people can play with. HN users can try it out, give you feedback, and ask questions in the thread.

This blog post is not a "Show HN" meterial.

Plase change the title to "Using BTRFS with loopback for compressed directories".

pmlnr · on May 8, 2017

Use ZFS, create a dataset for the dir and turn compression on per dataset.

aeroevan · on May 8, 2017

How is this different than a chattr +c on the directory? I suppose the compress option doesn't force compression (so it will try to recompress compressed files).

fuzzyfizz · on May 8, 2017

I don't get it. How does this produce 1TB of storage with a 50GB backing store?

rovr138 · on May 8, 2017

It creates a 50GB store. And because of compression, he can write 1TB of data onto it.

He/She is hoping the 50GB's are enough since there are no calculations on the article.