A few years ago I designed a way to detect bit-flips in Firefox crash reports and last year we deployed an actual memory tester that runs on user machines after the browser crashes. Today I was looking at the data that comes out of these tests and now I'm 100% positive that the heuristic is sound and a lot of the crashes we see are from users with bad memory or similarly flaky hardware. Here's a few numbers to give you an idea of how large the problem is. 🧵 1/5
You can’s speak about not having frequent corruption of files when you are not using tools detecting it. I can guarantee you have plenty of already corrupt stuff on your hard drives. RAM bit flips do contribute to that.
You have bugs (leading to broken documents, something failing, freezes, crashes) in applications you use and part of them is not due to developer’s error, but due to uncorrected memory errors.
If you’d try using a filesystem like ZFS with checksumming and regular rescans, you’d see detected errors very often. Probably not corrected, because you’d not use mirroring to save space, dummy.
And if you were using ECC, you’d see messages about corrected memory errors in dmesg often enough.
You can’s speak about not having frequent corruption of files when you are not using tools detecting it. I can guarantee you have plenty of already corrupt stuff on your hard drives. RAM bit flips do contribute to that.
You have bugs (leading to broken documents, something failing, freezes, crashes) in applications you use and part of them is not due to developer’s error, but due to uncorrected memory errors.
If you’d try using a filesystem like ZFS with checksumming and regular rescans, you’d see detected errors very often. Probably not corrected, because you’d not use mirroring to save space, dummy.
And if you were using ECC, you’d see messages about corrected memory errors in dmesg often enough.