Bit rot is real. Last month we’ve had weird linker errors on one of our build servers. Turns out that one of the binary libs in the build cache got a single bit flipped in the symbol table, which changed the symbol name, causing the linker errors. If the bit-flip had occured in the .TEXT section, then it wouldn’t have caused any errors at build time, and we would have released a buggy binary. It might have just crashed, but it could have silently corrupted data…
I’ve had a case where a bit flip in a TCP stream was not caught because it happened in a Singapore government deep packet inspection snoop gateway that recalculated the TCP checksum for the bit-flipped segment:
Fragmentation doesn’t change the TCP checksum. The packet is reassembled and the original checksum verified against the rehydrated packet and TCP segment.
It may be a regional thing but I have never heard ”bit rot” refer to legacy code. In the retro computing circles bit rot refers to hardware defects (usually floppies or other storage media) caused by cosmic rays or other environmental hazards.
I agree this is the primary context, but I've seen unmaintained (or very old) software being reffered to as "bit rotting" by extension. As in, forward compatibility might break due to obsolete dependencies, etc.
Same here. “Bit rot” is then analogous to food rot: the longer your data sits unverified, the more likely that there will be flipped bits and therefore “rotten data”.
Well, it wasn't that hard to uncover actually. We knew that the same build succeeds on our machines. So we only had to find what the difference was between the two :)
As Arthur Conan Doyle put it: "Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth." ¯\_(ツ)_/¯
Well done, Watson. This calls for a bit of snuff. Seriously, this is the kind of thing that keeps me up at night, and it's nice to hear a happy ending =D
I'm just thinking (of course) that you said If the bit-flip had occured but it's probably already when the bit flip occurs in the .TEXT section; we don't know what it might already have caused or just passed without notice (unreproducible bug, or bitflip in function that's never called or whatever).
Says it's a Mac/iOS build platform. Since it's a commercial service they're probably complying with the license and thus using actual Mac hardware, and in turn the only ECC option is the really awful value, outdated Mac Pro. Seems more likely they're using Minis instead, or at least mostly Minis. An unfortunate thing about Apple hardware (says someone still nursing along a final 5,1 Mac Pro for a last few weeks).