As a start note: most of the data I found is kinda outdated (year 2003, etc), so links to newer data are most welcomed!
Is a random bit flip possible?
Yes. Actually more possible then I've expected and that's why ECC RAM (ECC being Error-Correction Code) dices are so widely used in servers.
I found two cool papers with some statistics:
- "DRAM Errors in the Wild: A Large-Scale Field Study", Bianca Schroeder (University of Toronto), Eduardo Pinheiro (Google Inc.), Wolf-Dietrich Weber (Google Inc.), SIGMETRICS, 2009.
- Soft Errors in Electronic Memory – A White Paper, Tezzaron Semiconductor, 2003/2004.
It's worth looking at both of these for detailed information, but in short: a number 3751 appears in the first paper as the avg. number of Ccorrectable Errors (in ECC, in non-ECC RAM these are not corrected) in a DIMM dice per nearly 2.5y of constant work - that gives 4.11 CE per day (i.e. ~4 random bit flips that were corrected due to ECC being used). The full table is presented below:
CE is Correctable Errors, UE is Uncorrectable Errors
The second paper contains a table presenting collected failure rates for different types of memory:
FIT is Failure in Time: Errors per 10e9 hours of use
So yeah, random bit flips (actually called Soft Errors) do happen.
Also, some time ago I've seen a cool case study about tracing an error in software to a random bit filp: Attack of the Cosmic Rays! by Nelson Elhage.
Linux users might want take a look the dmidecode command's output, the Memory-related sections. If you have ECC RAM it will show you the number of detected and corrected errors. Otherwise you might just see a Error Correction Type: None entry (thanks goes to Tavis for showing me this).
What influences the odds of a Soft Error happening?
Actually quite a few things (in random order; source: above papers and wiki pages):
- Alpha particles
- Cosmic rays
- Lower voltage
- Higher speeds
- Construction of the dice
- and other... (see the papers for details)
A random bit flip being a security problem? Surely you're joking.
Actually I'm not.
Let's start with the Gameboy Color Boot ROM post, where the author described how he bypassed the anti-ROM-dump mechanism by introducing random bit flips in the CPU. It's a fascinating read!
The second paper I would like to point out here is:
- Using Memory Errors to Attack a Virtual Machine, Sudhakar Govindavajhala and Andrew W. Appel, Princeton University, 2003.
The paper is about Java and .NET VMs and describes how to create such a memory layout that most of the random bit flips would cause a Write-What-Where condition to appear, which is exploitable in a straight forward and allows to get to get code execution.
They also describe how they tested the idea: using an 50W spotlight that heated up the lamp (in short: it worked and took about 1 minute of heating, though some nasty system crashes also appeared):
Huh, using spotlight for hacking, now that's cool!
So, an interesting Sci-Fi idea (well, maybe not so "-Fi" after all) would be a "hacking gun" that pointed to a CPU/RAM would flip just the right bit ;)
Did you try to flip a bit?
Actually I'm still trying. I've modified OSAmber (a pet bootloader+minimal kernel of mine) to scan memory for any bit flips, but no luck so far (even though I've heated the RAM by quite a lot a few times).
I'll update this post if I get anywhere with this experiment. But for now, a screen shot will have to do:
And that's it for now. Guess the next RAM I'll buy will be ECC. Cheers ;)
P.S. Looks like this post was put on reddit (under ReverseEngineering) - the comments on reddit are often worth checking out :)
I never got to update the results of the experiments, so I'll do that now:
I've been testing for a few days, changing the way the detection works from time to time:
- At first the memory was all zeroed and I tried to detect any bits that flipped to 1.
- But I decided that might not work, so I changed the content of each memory cell (where a cell for me was a 32-bit block) to be equal to a simple hash of the address (this way I got different memory content across the memory), and check if the values changed.
- In both cases I scanned 2GB of memory, taking care to disable caching by the CPU...
- and I heatet the RAM using a powerful flashlight (it got really hot, though I don't have exact measurements).
- In both cases after running a few days non stop I failed to find any memory bits flipped.
So no luck with my flipping. I guess it might be fun to redo the experiment and let it run a longer time. And make it more scientific by actually noting the temperature.
Random note: I had to restart the experiment after a few hours since I forgot to enable the A20 line in OS Amber. Ups.