*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing
Hyojong Kim
Ph.D. Student
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Friday, May 17, 2019
Time: 10:00 AM - 12:00 PM (EDT)
Location: Klaus 2100
Committee:
Dr. Hyesoon Kim (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)
Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering, Georgia Institute of Technology)
Abstract:
In response to unprecedented demand for compute and memory, modern graphics processing units (GPUs) allow use of multiple GPUs in a system or use of system memory (i.e., CPU memory) in a user-transparent manner. Compute capability scales out with the use of multiple GPUs, and the use of system memory provides an order of magnitude larger memory capacity to a GPU application. However, both techniques require data to be migrated over the system bus (e.g., PCIe bus in modern systems) on demand during execution. Provided that data migration over the PCIe bus takes much longer than what traditional GPUs are designed for, the efficacy of these techniques in provisioning high performance depends on mitigating the data migration overhead.
In this thesis proposal, I propose several ideas to help mitigating the data migration overhead. First, I propose CODA, a mechanism to enable co-location of computation and data for multi-GPU systems. CODA estimates the amount of exclusive data and selectively allocates them in a single GPU in the presence of fine-grained memory interleaving, while distributing shared data across multiple GPUs. It uses an affinity-based thread block scheduling policy to place compute in the same GPU as the data it accesses. This enables efficient use of multi-GPUs by exploiting compute capability provisioned by multi-GPUs, while minimizing unnecessary off-chip data migrations. Next, I propose SCD, a mechanism to realize an efficient unified memory management in modern GPUs. SCD reduces major inefficiencies that arise in the page fault handling mechanism employed in modern GPUs. SCD supports a CPU-like thread block context switching to reduce the number of batch processing and amortize the batch processing overhead. It takes page eviction off the critical path with no hardware changes by overlapping evictions with CPU-to-GPU page migrations. It reduces CPU-to-GPU page migration time by providing a lightweight, on-the-fly compression and decompression.