Rowhammer: The gift that keeps on giving (2021)

Syonyk · November 15, 2021, 5:01pm

Through creative application of rowhammering techniques, all the DDR4 modules they tested (which include the target-row-refresh hardware mitigations that were supposed to fix the issue once and for all) fall to Rowhammer.

https://comsec.ethz.ch/wp-content/files/blacksmith_sp22.pdf is the paper link for the full gory details.

Abstract—We present the new class of non-uniform Rowhammer
access patterns that bypass undocumented, proprietary
in-DRAM Target Row Refresh (TRR) while operating in a
production setting. We show that these patterns trigger bit flips
on all 40 DDR4 DRAM devices in our test pool. We make a
key observation that all published Rowhammer access patterns
always hammer “aggressor” rows uniformly. While uniform
accesses maximize the number of aggressor activations, we find
that in-DRAM TRR exploits this behavior to catch aggressor rows
and refresh neighboring “victims” before they fail. There is no
reason, however, to limit Rowhammer attacks to uniform access
patterns: smaller technology nodes make underlying DRAM
technologies more vulnerable, and significantly fewer accesses
are nowadays required to trigger bit flips, making it interesting
to investigate less predictable access patterns.
The search space for non-uniform access patterns, however, is
tremendous. We design experiments to explore this space with
respect to the deployed mitigations, highlighting the importance
of the order, regularity, and intensity of accessing aggressor
rows in non-uniform access patterns. We show how randomizing
parameters in the frequency domain captures these aspects
and use this insight in the design of Blacksmith, a scalable
Rowhammer fuzzer that generates access patterns that hammer
aggressor rows with different phases, frequencies, and amplitudes.
Blacksmith finds complex patterns that trigger Rowhammer bit
flips on all 40 of our recently-purchased DDR4 DIMMs, 2.6×
more than state of the art, and generating on average 87×
more bit flips. We also demonstrate the effectiveness of these
patterns on Low Power DDR4X devices. Our extensive analysis
using Blacksmith further provides new insights on the properties
of currently-deployed TRR mitigations. We conclude that after
almost a decade of research and deployed in-DRAM mitigations,
we are perhaps in a worse situation than when Rowhammer was
first discovered.

I like the end of their abstract.

We conclude that after almost a decade of research and deployed in-DRAM mitigations, we are
perhaps in a worse situation than when Rowhammer was first discovered.

Time to curl up with this paper and see just how bad it is… though I already know the answer.

Ugh. Computers.

Vertiginous · November 15, 2021, 11:52pm

<insert phpBB popcorn munching emoji here>

They say ECC memory makes it harder, so there’s some hope yet, but harder != impossible or even necessarily actually difficult, if it can be algorithmically reduced.

I’ll be interested to find out what strategies they take to try to minimize the damage from this attack. My read on it suggests that it may well be impossible to entirely mitigate, and we’ll end up with a sliding scale of techniques used to hopefully reduce the success percentage rather than cut it to identical zero. How close they can get is the big question.

Attacks like this are certain to continue to evolve in sophistication. There’s going to be a massive installed and actively utilized base of vulnerable memory for decades yet anyway.

milfox · November 16, 2021, 3:02am

‘many smaller cores’ with non-shared memory, or some other way of limiting memory access per-core at the hypervisor level?

Syonyk · November 16, 2021, 3:43am

Yup. After chewing through it… “ugh, computers…” is still applicable.

This is a perfect example of, “Attacks only get better.” They were able to test some older rowhammer techniques (TRRespass) against their samples, and their technique went from ~35% of the DIMMs failing with TRRespass, to 100% failing with their Blacksmith tool. Literally every one. Then they took some LPDDR4 that was behaving properly, did some analysis on the TRR stuff, and figured out how to make it misbehave.

The takeaway should be that with a bit of effort, all DDR4 is vulnerable to rowhammer, the only question is “Just how bad is it?” Given that one well placed flip can compromise a system entirely… well… yeah. It sucks.

The “solution” to Rowhammer in the DDR3 chips was, “Oh, yeah, we’ll totally use Target Row Refresh in DDR4, that’ll solve it.” If some row has been accessed enough, refresh nearby rows. Problem solved! Well… for trivial cases and examples, yes, but the last couple papers have been “How to bypass TRR and still get bit flips.” This being a case. The TRR stuff just can’t handle complex patterns, and the DRAM is bad enough that even with less-than-optimal hammering, you can still get plenty of useful bit flips out of the system.

Seriously, though. If you have a DDR4 based system, your system is vulnerable. That’s how bad this is.

ECC helps some, and the early papers said something along the lines of, “Well, yeah, you could theoretically break ECC with it, but that would be really hard, you’d have to deeply understand the system… eh, probably fine.” Enter bored grad students… and ECC falls. It’s a lot harder to make work, but once you get the knack for a particular configuration, it’s not too hard, apparently.

“Ignore it entirely and pretend it doesn’t exist” would be my bet. Respond to anyone who argues with “Damned paranoid security nerds, this’ll never work in practice,” and go on the way.

The problem is that most of the mitigations are in hardware. TRR is “in-DIMM” mitigations, you can’t just go flashing the new RAM firmware in the field. If it’s vulnerable to some pattern, it’s vulnerable, and all you can do is change the RAM out, or maybe put some software mitigations in, but they both slaughter performance and don’t tend to work. It’s ugly.

You could, sure, but how’s the actual memory controller slicing up physical addresses to DIMMs? How are the DIMMs mapping row number to physical position? Some of these are considered trade secrets (Intel’s triple channel algorithm for the first tends this way), some are simply unknowable from the interfaces provided (the second - you can sometimes work it out, but not with great detail).

Even with a lot of information provided by the system (server platforms tend to be a bit more agreeable to this if you know how to ask), it’s still far from easy to segment out physical chunks, and all that does is help if the row # to physical mapping is sane. A remapped row because one had defects can be all the way across the die from the numerically close ones, leading to potentials for attack.

On the plus side, I’m pretty sure my ancient Atom netbook Clank won’t be vulnerable to this… it doesn’t have the DRAM performance, nor the CPU performance, and it’s immune to speculative vulnerabilities because it doesn’t speculate. It’s also rather slower than a Rpi 4…

milfox · November 16, 2021, 5:05am

Maybe just keep the app’s access within a single page and treat dimms like storage and abstracted accordingly? Can you rowhammer a paging file? Yeah perf drawback, but isn’t the whole point of rowhammers is that you get near-native access to ram and can discover neighboring row’s information?

‘You get one page, anything else will be abstracted like swap’ would be slower but also prevent the direct memory access that allows this style of attack.

Syonyk · November 16, 2021, 3:04pm

If your goal is to emulate a 486 on modern hardware, that could do it!

Making a kernel trap for every memory access isn’t feasible, and that’s how most workloads would end up under that sort of system.

You can’t generally discover a neighboring row’s information. You can, however, corrupt it - regardless of the page access permissions. It’s the art of writing something you can’t write by reading something you can read very aggressively.

“Direct memory access” to memory is pretty much how modern systems work. There’s no easy way to replace that model and still have anything faintly performant on current system designs.