Syonyk
June 27, 2023, 9:31pm
1
Rowhammer is just the gift that keeps on giving. This paper digs into ways that modern DRAM chips can also be made to misbehave (“Not keep the data written to them”) by keeping a row open.
Abstract
Memory isolation is critical for system reliability, security, and safety. Unfortunately, read disturbance can break memory isolation in modern DRAM chips. For example, RowHammer is a well-studied read-disturb phenomenon where repeatedly opening and closing (i.e., hammering) a DRAM row many times causes bitflips in physically nearby rows.
This paper experimentally demonstrates and analyzes another widespread read-disturb phenomenon, RowPress, in real DDR4 DRAM chips. RowPress breaks memory isolation by keeping a DRAM row open for a long period of time, which disturbs physically nearby rows enough to cause bitflips. We show that RowPress amplifies DRAM’s vulnerability to read-disturb attacks by significantly reducing the number of row activations needed to induce a bitflip by one to two orders of magnitude under realistic conditions. In extreme cases, RowPress induces bitflips in a DRAM row when an adjacent row is activated only once. Our detailed characterization of 164 real DDR4 DRAM chips shows that RowPress 1) affects chips from all three major DRAM manufacturers, 2) gets worse as DRAM technology scales down to smaller node sizes, and 3) affects a different set of DRAM cells from RowHammer and behaves differently from RowHammer as temperature and access pattern changes. We also show that cells vulnerable to RowPress are very different from cells vulnerable to retention failures.
We demonstrate in a real DDR4-based system with RowHammer protection that 1) a user-level program induces bitflips by leveraging RowPress while conventional RowHammer cannot do so, and 2) a memory controller that adaptively keeps the DRAM row open for a longer period of time based on access pattern can facilitate RowPress-based attacks. To prevent bitflips due to RowPress, we describe and analyze four potential mitigation techniques, including a new methodology that adapts existing RowHammer mitigation techniques to also mitigate RowPress with low additional performance overhead. We evaluate this methodology and demonstrate that it is effective on a variety of workloads. We open source all our code and data to facilitate future research on RowPress.
sigh
Well that was a horrifying read.
ECC won’t save you now!
We make two key observations from our analysis. First, there are up to 25 RowPress bitflips (not shown) in a 64-bit data word. ECC schemes that are widely used in memory systems (e.g., SECDED [ 122 ] and Chipkill [ 123 – 125 ]23) cannot correct or detect all RowPress bitflips we observe, which can lead to silent data corruption [ 126 – 128 ]. Even a (7, 4) Hamming code (correcting one bitflip in a 4-bit data word) [122 ] with 75% DRAM storage overhead (3 parity bits for every 4 data bits), is not capable of correcting 25 bitflips in a 64-bit data word. Other ECC schemes that can correct all RowPress bitflips require prohibitively large storage overheads. Thus, relying on ECC alone to prevent all RowPress bitflips is a very expensive solution. Second, for all three manufacturers (Mfrs. A, B, and C), a significant fraction (up to 0.99%, 35.77%, and 10.08% for tAggON =7.8 μs, respectively) of 64-bit data words exhibit at least three Row-Press bitflips. This makes RowPress bitflips costly to prevent usingtechniques like memory page retirement (where erroneous DRAM rows are not used in the system) [ 129 , 130 ] since such techniques could render up to 35.77% of storage capacity useless.