JEDEC JEP 301-1
SYSTEM LEVEL ROWHAMMER MITIGATION
| Organization: | JEDEC |
| Publication Date: | 1 March 2021 |
| Status: | active |
| Page Count: | 14 |
scope:
A DRAM rowhammer security exploit is a serious threat to cloud service providers, data centers, laptops, smart phones, self-driving cars and IoT devices. Hardware research and development will take time. DRAM components, DRAM DIMMs, System-on-chip (SoC), chipsets and system products have their own design cycle time and overall life time.
1) Rowhammer vulnerability is a fundamental DRAM issue.
2) RFM is designed to alleviate rowhammer concerns, but cannot eliminate vulnerability to all possible forms of attacks.
3) System companies need the maximum amount of flexibility in monitoring rowhammer and deploying the appropriate mitigation measures (HW & SW):
a) MAC (Maximum Active Count) and blast radius. Also known as DRAM victim row vulnerability to aggressor row activation count.
b) Comprehensive ECC, BIST, BISR statistics
c) The ability for the OS to report an aggressor to DRAM and ask DRAM to take mitigation actions
d) The ability for DRAM to ask for pause (no new commands can be issued by SoC) if needed. It can be non-deterministic (Ready or Wait)
This publication recommends the following best practices to mitigate the security risks from rowhammer attacks.
1) At the product level (e.g., cloud, smart phones), each company has its reliability (bit error rate) and security (privilege escalation or denial of service) requirements. The following guidelines are recommended:
a) Companies share rowhammer test patterns (e.g., Github[1]) and tools (e.g., Github[2]) with each other. Security is a shared industry concern, not an arena for competitive advantage.
b) Companies invest in memory validation and a quality assurance process. The bit error rate metrics should be quantified clearly for the suppliers (e.g., DRAM package and DIMMs).
2) At the Operating System (OS) and FW level, add reliability, availability and serviceability (RAS) and security features to the most critical instruction and data structure. The following guidelines are recommended:
a) A DRAM health monitor that takes as many signals as possible from SoC. The signals can be ECC error statistics, ECC logs, MBIST statistics, TRR tracking circuits, memory controller performance monitors. Heuristic-based DRAM health monitors can potentially detect a large-scale rowhammer attack.
b) Isolate the user program from the most critical data structures (e.g., Page Table Entry) in the OS kernel. Protect the kernel in a fixed address space with memory encryption and integrity check (e.g., Hash-based Message Authentication Code). For example, carve out a 256MB enclave region in 8GB DRAM addressable space for the kernel. Some SoC vendors provide integrity for the enclave region.
3) At the SoC and memory controller level, invest in hardware circuits that improve RAS and security. The following guidelines are recommended:
a) Built-in-self-test (BIST) and built-in-self-repair
b) Post-package-repair (PPR) support in the field. If the repair resource is exhausted, provide the bad DRAM address to the OS so that it can be taken offline.
c) TRR tracking circuits to complement DRAM's mitigation scheme. SoC and memory controller to track the aggressor address. DRAM to figure out the victim address and the time needed to adequately perform the mitigation.
d) System-level ECC. Detection is more important than correction. On-die SEC ECC is not good enough. 3-4 bit flips are observed in 128-bit data. The existing ECS/patrol mechanism (e.g., 24 hour) is not responsive enough.
e) Additional buffer memory hierarchy to shield the main memory from rowhammer attack. Conceptually, it is a hardware managed cache. It will be harder (but still possible) for the hackers to instrument a rowhammer attack without cache flush support.
f) High-performance memory encryption (ME) and Hash-based Message Authentication Code (HMAC) circuits.
Document History