Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ZFS on Linux still has annoying issues with ARC size (utcc.utoronto.ca)
101 points by protomyth on July 20, 2019 | hide | past | favorite | 21 comments


There's a good lesson in operability here: log reasons for decisions!

I have had to re-learn this lesson over and over again with my own software. "Tom, why did the system just do that?!" scream my users. "Er, let me check", i respond, already feeling that sinking feeling. "EDUNNOMATE" says the log. So, i add some logging around the decision (the data feeding into it, the choices made, the actions resulting), redeploy, and wait for my users to start screaming again, hoping that this time, i will be able to give them an answer.


The author calls ARC autotuning “opaque” which surprised me given a well regarded paper has been published on it:

https://www.usenix.org/legacy/events/fast03/tech/full_papers...

https://youtu.be/F8sZRBdmqc0

...that said the author has been writing on the ARC for more than 10 years judging from his blog links so perhaps that paper did not answer his questions.


Before I ditched ZoL for persistent storage for a few hundred NGINX caches, I saw this behavior too.

Setting zfs_arc_min to something like 50% of arc_max stopped it from dumping the ARC every 10 minutes.

YMMV.


Out of curiosity, what did you happen to move to (and why)? Back to ext4/xfs, or to Btrfs or something else more involved?


I just mount each SSD as its own XFS filesystem and use NGINX’s split feature to fill them up.

Not resilient on a system level, but refilling the cache is cheap.

ZFS was generally pleasant from an operability viewpoint once we ironed out the quirks, but the perf hit from no sendfile was too much.


That’s a good point; I never looked at what the perf impact of disabling sendfile would be on even a moderately-loaded webserver.

It’d be really nice to see that fixed like the recent DIRECT_IO additions.


Of course the people around these parts tend to have very particular needs and use cases, but for anything resembling the "common case" the performance impact of not using sendfile should be negligible.

(I'll just point of that using sendfile means that traffic is unencrypted... which is probably fine on an internal network, but I've started adopting the stance that even internal network traffic should be encrypted unless there's a very good reason not to do that. An absolute requirement for performance might be a good reason.)


If nginx decided to support ktls they could use sendfile for encrypted traffic as well. Unsure if it is worth it just to make sendfile work however.


I was going to mention kernel TLS hopefully enabling sendfile for mostly-HTTPS workloads, as that’s the direction everything is heading anyway, and without it we don’t get zero-copy for those connections.

Now I’m more curious about the actual threshold where not having sendfile begins causing noticeable performance problems… at what point before you become Netflix?


If your cache can face-tank a HTTP-DDoS, you don't need fragile fingerprinting techniques to distinguish bad from good, thus reducing the user impact (less accidentally-blocked users). The less cost you have for filling that 100 Gbit NIC with your TLS cache traffic, the more boxes you can afford. Internet exchanges are surprisingly cheap to connect to.

Of course sharing resources between a couple services would be good, as NICs and switch ports are sill a way from free.


What about http2?


I have very weird read performance issues on read using the stable ZoL in current Ubuntu LTS, on a box with over 200GB of ram and a few TB of L2ARC fast flash.

The default settings for L2ARC fill rate are also super low.

I haven’t had time to track down exactly why it’s so slow, yet.


What’s the topology of your array? How many disks? L2ARC doesn't help with that much ram because your main memory will be faster than even mirrored nvme caches


8x 10TB HDD, 4x 512TB flash, all 6gpbs SATA. 256GB ram, 40 cores. The HDDs are all in raidz2, with the SSDs all as L2ARC.

I have an ubuntu mirror on the machine that's around 150gb, and doing a `tar -c $MIRRORPATH | pv > /dev/null` shows lots of reads from the HDDs, even on second, third, fourth runs. It confuses me.


I would remark that allocation_classes (which is in 0.8, but not what Ubuntu ships yet AFAIK) would maybe be a better use for your fast flash storage.

Of course, if the L2ARC dies, you shrug, while if allocation_classes vdevs die, your pool is gone, so there's that tradeoff to be aware of too.


Have you calculated how much RAM those L2ARC entries will occupy?


if you have TB of L2ARC how big is the pool?


I have the same problem. Sometimes it just drops the whole cache suddenly and I don't have a clue why: https://walkman.cloud/s/zXLp7DF9sDFwr7z


You may find it informative to graph MRU/MFU - I suspect you will find that the MRU is being dumped. [1]

I personally can't decide whether I think it's a bug or not, since if the MRU is all old items there is an argument to be had that you don't want it in cache any more...but dumping 100% of it strikes me as a bug either way. :)

[1] - https://github.com/zfsonlinux/zfs/issues/7820


An L2ARC is simply a cache of files that you have randomly accessed recently. I wonder if at 2 am files haven't been accessed in some cleanup period and it begins purging itself. Here's a link to other possible reasons. http://www.brendangregg.com/blog/2008-07-22/zfs-l2arc.html


I still have this bug. https://github.com/zfsonlinux/zfs/issues/8396

Page faults from NFS client side aren't served by the server when they should (readonly map, reading a page). I could imagine this is related.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: