ZFS on Linux still has annoying issues with ARC size

twic · on July 20, 2019

There's a good lesson in operability here: log reasons for decisions!

I have had to re-learn this lesson over and over again with my own software. "Tom, why did the system just do that?!" scream my users. "Er, let me check", i respond, already feeling that sinking feeling. "EDUNNOMATE" says the log. So, i add some logging around the decision (the data feeding into it, the choices made, the actions resulting), redeploy, and wait for my users to start screaming again, hoping that this time, i will be able to give them an answer.

mapgrep · on July 20, 2019

The author calls ARC autotuning “opaque” which surprised me given a well regarded paper has been published on it:

https://www.usenix.org/legacy/events/fast03/tech/full_papers...

https://youtu.be/F8sZRBdmqc0

...that said the author has been writing on the ARC for more than 10 years judging from his blog links so perhaps that paper did not answer his questions.

higels · on July 20, 2019

Before I ditched ZoL for persistent storage for a few hundred NGINX caches, I saw this behavior too.

Setting zfs_arc_min to something like 50% of arc_max stopped it from dumping the ARC every 10 minutes.

YMMV.

m11r · on July 20, 2019

Out of curiosity, what did you happen to move to (and why)? Back to ext4/xfs, or to Btrfs or something else more involved?

higels · on July 20, 2019

I just mount each SSD as its own XFS filesystem and use NGINX’s split feature to fill them up.

Not resilient on a system level, but refilling the cache is cheap.

ZFS was generally pleasant from an operability viewpoint once we ironed out the quirks, but the perf hit from no sendfile was too much.

m11r · on July 20, 2019

That’s a good point; I never looked at what the perf impact of disabling sendfile would be on even a moderately-loaded webserver.

It’d be really nice to see that fixed like the recent DIRECT_IO additions.

Quekid5 · on July 20, 2019

Of course the people around these parts tend to have very particular needs and use cases, but for anything resembling the "common case" the performance impact of not using sendfile should be negligible.

(I'll just point of that using sendfile means that traffic is unencrypted... which is probably fine on an internal network, but I've started adopting the stance that even internal network traffic should be encrypted unless there's a very good reason not to do that. An absolute requirement for performance might be a good reason.)

atomt · on July 21, 2019

If nginx decided to support ktls they could use sendfile for encrypted traffic as well. Unsure if it is worth it just to make sendfile work however.

m11r · on July 21, 2019

I was going to mention kernel TLS hopefully enabling sendfile for mostly-HTTPS workloads, as that’s the direction everything is heading anyway, and without it we don’t get zero-copy for those connections.

Now I’m more curious about the actual threshold where not having sendfile begins causing noticeable performance problems… at what point before you become Netflix?

namibj · on July 21, 2019

If your cache can face-tank a HTTP-DDoS, you don't need fragile fingerprinting techniques to distinguish bad from good, thus reducing the user impact (less accidentally-blocked users). The less cost you have for filling that 100 Gbit NIC with your TLS cache traffic, the more boxes you can afford. Internet exchanges are surprisingly cheap to connect to.

Of course sharing resources between a couple services would be good, as NICs and switch ports are sill a way from free.

shanemhansen · on July 21, 2019

What about http2?

sneak · on July 20, 2019

I have very weird read performance issues on read using the stable ZoL in current Ubuntu LTS, on a box with over 200GB of ram and a few TB of L2ARC fast flash.

The default settings for L2ARC fill rate are also super low.

I haven’t had time to track down exactly why it’s so slow, yet.

blackflame7000 · on July 20, 2019

What’s the topology of your array? How many disks? L2ARC doesn't help with that much ram because your main memory will be faster than even mirrored nvme caches

sneak · on July 21, 2019

8x 10TB HDD, 4x 512TB flash, all 6gpbs SATA. 256GB ram, 40 cores. The HDDs are all in raidz2, with the SSDs all as L2ARC.

I have an ubuntu mirror on the machine that's around 150gb, and doing a `tar -c $MIRRORPATH | pv > /dev/null` shows lots of reads from the HDDs, even on second, third, fourth runs. It confuses me.

rincebrain · on July 20, 2019

I would remark that allocation_classes (which is in 0.8, but not what Ubuntu ships yet AFAIK) would maybe be a better use for your fast flash storage.

Of course, if the L2ARC dies, you shrug, while if allocation_classes vdevs die, your pool is gone, so there's that tradeoff to be aware of too.

magicalhippo · on July 21, 2019

Have you calculated how much RAM those L2ARC entries will occupy?

sigstoat · on July 21, 2019

if you have TB of L2ARC how big is the pool?

kissgyorgy · on July 20, 2019

I have the same problem. Sometimes it just drops the whole cache suddenly and I don't have a clue why: https://walkman.cloud/s/zXLp7DF9sDFwr7z

rincebrain · on July 20, 2019

You may find it informative to graph MRU/MFU - I suspect you will find that the MRU is being dumped. [1]

I personally can't decide whether I think it's a bug or not, since if the MRU is all old items there is an argument to be had that you don't want it in cache any more...but dumping 100% of it strikes me as a bug either way. :)

[1] - https://github.com/zfsonlinux/zfs/issues/7820

blackflame7000 · on July 20, 2019

An L2ARC is simply a cache of files that you have randomly accessed recently. I wonder if at 2 am files haven't been accessed in some cleanup period and it begins purging itself. Here's a link to other possible reasons. http://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html

cracauer · on July 21, 2019

I still have this bug. https://github.com/zfsonlinux/zfs/issues/8396

Page faults from NFS client side aren't served by the server when they should (readonly map, reading a page). I could imagine this is related.