Both KeyKOS and EUMEL used periodic globally consistent checkpoints supplemented...

zozbot234 · on March 26, 2022

> So when the system crashed you would lose your last few minutes of work

So, pretty much how disk storage is managed today? But then if you're going to checkpoint application programs from time to time, it might better to design them so that they explicitly serialize their program state to a succinct representation that they can reload from, and only have OS-managed checkpoint and restore as a last resort for programs that aren't designed to do this...Which is again quite distinct from having everything on a "single level".

> but with modern SSDs a snapshot every 10 milliseconds or so seems feasible, depending on write bandwidth.

Not so. Even swapping to SSD is generally avoided because every single write creates wear-and-tear that impacts lifetime of the device.

kragen · on March 26, 2022

Well, I don't think it's "pretty much how disk storage is managed today," among other things because you don't have a multi-minute reboot process, and you usually lose some workspace context in a power failure.

The bigger advantage of carefully designed formats for persistent data is usually not succinctness but schema upgrade.

It's true that too much write bandwidth will destroy your SSDs prematurely, and that might indeed be a drawback of a single-level store, because you might end up spending that bandwidth on things that aren't worth it. This is especially true now that DWPD has collapsed from 10 or more down to, say, 0.2. But they're fast enough that you can do submillisecond durable commits, potentially hundreds of them per second. In the limit where each snapshot writes a single 4 KiB page, 100 snapshots per second is 0.4 MB/s.

If you have a 3-year-life 1TiB SSD with 0.2 DWPD, you can write about 220 TiB to it before it fails. So if you have 4 GiB of RAM, in the worst case where every checkpoint dirties every page, you can only take 56000 checkpoints before killing the drive: about an hour and a half if you're doing it every 10 ms. But actually the drive probably isn't that fast. If you limit yourself to an Ultra DMA 2 speed of 33 MB/s on average, while still permitting bursts of the SSD's maximum speed, that's 2.8 months, but a maximum checkpoint latency of more than 2 minutes. You have to scale back to 7.7 MB/s to guarantee a disk life of a year, at which point your worst-case checkpoint time rises to over 9 minutes.