Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You multiply the probability of two random events to get the probability they will happen at the same time. If the expected value of a bit flip is 10^-18 then two would be 10^-36 and three would he 10^-54.

At some point it becomes a philosophical question of how much can the tails of the distribution be tolerated. We've never seen a quantum fluctuation make a whale appear in the sky.



DRAM failures are not independent events, so it’s not appropriate to multiply the probabilities like that. Faults are often clustered in a row, column, bank, page or whatever structure your DRAM has, raising the probability of multi-bit errors.


I believe the usual concern is bit flips due to subatomic particles, and as far as I'm aware that only flips one bit per particle.


I don't see why a high-energy particle strike would confine itself to a single bit. The paper I posted elsewhere in this thread says that "the most likely cause of the higher single-bit, single-column, and single-bank transient fault rates in Cielo is particle strikes from high-energy neutrons". In the paper, both single-bit and multi-bit errors are sensitive to altitude.


A single particle strike would only affect a single transistor. If that transistor controls a whole column of memory, then sure it could corrupt lots of bits. With ECC, though, it would probably result in a bunch of ECC blocks with a single bit flip, rather than a single ECC block with several bit flips.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: