As a start note: most of the data I found is kinda outdated (year 2003, etc), so links to newer data are most welcomed!
Is a random bit flip possible?
Yes. Actually more possible then I've expected and that's why ECC RAM (ECC being Error-Correction Code) dices are so widely used in servers.
I found two cool papers with some statistics:
- "DRAM Errors in the Wild: A Large-Scale Field Study", Bianca Schroeder (University of Toronto), Eduardo Pinheiro (Google Inc.), Wolf-Dietrich Weber (Google Inc.), SIGMETRICS, 2009.
- Soft Errors in Electronic Memory – A White Paper, Tezzaron Semiconductor, 2003/2004.
It's worth looking at both of these for detailed information, but in short: a number 3751 appears in the first paper as the avg. number of Ccorrectable Errors (in ECC, in non-ECC RAM these are not corrected) in a DIMM dice per nearly 2.5y of constant work - that gives 4.11 CE per day (i.e. ~4 random bit flips that were corrected due to ECC being used). The full table is presented below:
CE is Correctable Errors, UE is Uncorrectable Errors
The second paper contains a table presenting collected failure rates for different types of memory:
FIT is Failure in Time: Errors per 10e9 hours of use
So yeah, random bit flips (actually called Soft Errors) do happen.
Also, some time ago I've seen a cool case study about tracing an error in software to a random bit filp: Attack of the Cosmic Rays! by Nelson Elhage.
Linux users might want take a look the dmidecode command's output, the Memory-related sections. If you have ECC RAM it will show you the number of detected and corrected errors. Otherwise you might just see a Error Correction Type: None entry (thanks goes to Tavis for showing me this).
By the way...
On 22nd Nov'24 we're running a webinar called "CVEs of SSH" – it's free, but requires sign up: https://hexarcana.ch/workshops/cves-of-ssh (Dan from HexArcana is the speaker).
What influences the odds of a Soft Error happening?
Actually quite a few things (in random order; source: above papers and wiki pages):
- Temperature
- Alpha particles
- Cosmic rays
- Lower voltage
- Higher speeds
- Construction of the dice
- and other... (see the papers for details)
A random bit flip being a security problem? Surely you're joking.
Actually I'm not.
Let's start with the Gameboy Color Boot ROM post, where the author described how he bypassed the anti-ROM-dump mechanism by introducing random bit flips in the CPU. It's a fascinating read!
The second paper I would like to point out here is:
- Using Memory Errors to Attack a Virtual Machine, Sudhakar Govindavajhala and Andrew W. Appel, Princeton University, 2003.
The paper is about Java and .NET VMs and describes how to create such a memory layout that most of the random bit flips would cause a Write-What-Where condition to appear, which is exploitable in a straight forward and allows to get to get code execution.
They also describe how they tested the idea: using an 50W spotlight that heated up the lamp (in short: it worked and took about 1 minute of heating, though some nasty system crashes also appeared):
Huh, using spotlight for hacking, now that's cool!
So, an interesting Sci-Fi idea (well, maybe not so "-Fi" after all) would be a "hacking gun" that pointed to a CPU/RAM would flip just the right bit ;)
Did you try to flip a bit?
Actually I'm still trying. I've modified OSAmber (a pet bootloader+minimal kernel of mine) to scan memory for any bit flips, but no luck so far (even though I've heated the RAM by quite a lot a few times).
I'll update this post if I get anywhere with this experiment. But for now, a screen shot will have to do:
And that's it for now. Guess the next RAM I'll buy will be ECC. Cheers ;)
UPDATE:
P.S. Looks like this post was put on reddit (under ReverseEngineering) - the comments on reddit are often worth checking out :)
UPDATE 2:
I never got to update the results of the experiments, so I'll do that now:
I've been testing for a few days, changing the way the detection works from time to time:
- At first the memory was all zeroed and I tried to detect any bits that flipped to 1.
- But I decided that might not work, so I changed the content of each memory cell (where a cell for me was a 32-bit block) to be equal to a simple hash of the address (this way I got different memory content across the memory), and check if the values changed.
- In both cases I scanned 2GB of memory, taking care to disable caching by the CPU...
- and I heatet the RAM using a powerful flashlight (it got really hot, though I don't have exact measurements).
- In both cases after running a few days non stop I failed to find any memory bits flipped.
So no luck with my flipping. I guess it might be fun to redo the experiment and let it run a longer time. And make it more scientific by actually noting the temperature.
Random note: I had to restart the experiment after a few hours since I forgot to enable the A20 line in OS Amber. Ups.
Comments:
If ECC corrects 4 random bit flips per day, then the estimated probability of a flip happening in any memory cell at any particular second of computation is around 0.00005 (0.005%). Not too shabby.
But for your exploit to work, you would need a flip in one *specific* memory cell. Odds of that happening - with 4GB of RAM - are around 1.2e-14. In other words, the expected number of seconds of blind hammering at desired memory cell is around 10e14. Better be prepared for next age then, as this is about 3 million years :)
However, these people were able to crack 1024-bit RSA with random bit flips during computation, and that involves more RAM so it is more likely:
http://www.engadget.com/2010/03/09/1024-bit-rsa-encryption-cracked-by-carefully-starving-cpu-of-ele/
;D
@Xion
Agreed, that's why I've written something like this in the post:
"Is this theoretically possible? Yes. And practically? Almost impossible, due to the unlikeliness of a bit flip and even more, the unlikeliness of a bit flip in the just right place."
<joke>That's why we need the Hack Gun (TM)!!!1</joke>
But thanks for doing the math ;>
@Sam Bowne
Thanks for the link Sam!
That's essentially what Artem Dinaberg's talk at Blackhat covers:
http://blackhat.com/html/bh-us-11/bh-us-11-briefings.html#Dinaburg
;>
@Jordan
Thanks for the link, it's awesome ;)
Anyway, I agree with your suggestion, though currently I'll continue with my experiment with a single machine. I have some ideas how to gather errors from different machine to achieve a goal (well, not as good as the DNS one from you link, at least yet) but no solid data nor solid ideas to share yet ;)
Cheers,
Sometimes I am a little bit afraid when I think that whole computing is error prone in such way and it's just a matter of very high probability that we get correct results :)
That's not cool! That's hot!
This flipping make installing self-checked installers (GOG, repacks) random, either no error, or CRC failed. I installed several copies of the same game from the one installer (i had to restart computer several times as not all installation instances passed CRC checks sometimes due to the issue), and later compared via Beyond Compare (binary and crc). They're different((( Around 20 bytes in several files have bit flipped in comparison. Extracting very old never changed achieves several times and comparison led to difference either. Yes, memtest reported errors right after, but it does not report anything when you just start windows without making some REAL memory work first.
During last two days filled technical letters to Motherboard and Memory vendors with the issue, will be waiting for a reply. I believe this is AMD's agesa update is fault for some reason. They're releasing ryzen 5000 series at the present and may be broke previous generation compatibility in theirs Memory Infinity Fabric.
Add a comment: