*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Intelligent Cache Management for 3D Memory Systems
Committee:
Dr. Moinuddin Qureshi, ECE, Chair , Advisor
Dr. Hyesoon Kim, CoC
Dr. Sudhakar Yalamanchili, ECE
Dr. Aamer Jaleel, NVIDIA
Dr. Milos Prvulovic, ECE
Abstract:
DRAM caches are important for enabling effective heterogeneous memory systems that can transparently provide the bandwidth of high-bandwidth memories(HBM), latency of lower-latency memories(DRAM), and the capacity of high-capacity memories(DRAM/3D-XPoint). We investigate enabling intelligent cache management for DRAM caches similar to ones already implemented in Intel Knights Landing. Such DRAM caches use a direct-mapped design, co-locate the tag and data within the DRAM array, and stream out the tag and the data concurrently on an access. But, such a direct-mapped organization can be subject to low hit-rate. We can attempt to use traditional methods to improve hit-rate and performance, such as associativity, intelligent replacement, or prefetching. However, simply applying traditional "well-understood" cache designs and memory designs to stacked memory results in low bandwidth utilization, high latency, and low overall system performance. To fully utilize the potential of stacked memory, we must architect systems to exploit the unique latency and bandwidth characteristics offered by DRAM. Throughout our work, we investigate and show how to enable associativity, intelligent replacement, and cache compression, in a bandwidth-efficient and scalable/low-SRAM-cost manner, to improve the performance of DRAM caches. (1) Associativity can be helpful for improving cache hit-rate, but we find that it cannot come at the cost of latency or bandwidth or it may risk degrading performance. Through ACCORD, we show how to scale way-prediction to giga-scale DRAM caches (by coordinating way-install and way-prediction) to enable similar performance to an ideal 2-way associative cache with <1KB SRAM. (2) Intelligent replacement policies, such as RRIP, can be used improve cache hit-rate. Through RRIP-AOB + ETR, we show how to achieve intelligent replacement in direct-mapped caches, by formulating replacement policies as bypassing policies and reducing state-update cost by coordinating replacement across sets. (3) Cache compression can also be used to improve DRAM-cache performance. Through DICE, we show how to use cache compression to achieve bandwidth-free prefetching. (4) To follow up, we find that future hybrid memory systems containing DRAM + 3D-XPoint are going to even more bandwidth-bound as the two memories likely to share DDR4 channels. To overcome DRAM cache bandwidth bloat, we propose a Dual-Tag approach to enable bandwidth-efficient and scalable/low-cost DRAM cache management. Finally, we combine the proposed techniques proposed to achieve a bandwidth-efficient and scalable cache with intelligent replacement and prefetching, that enables near ideal DRAM cache performance with only 34KB SRAM storage in the memory controller. Such scalable/low-cost high-performance DRAM cache controllers can make DRAM caching suitable for widespread deployment and can improve future memory-technology based caches.