*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Techniques to Mitigate Performance Impact of Off-chip Data Migrations in Modern GPU Computing
Hyojong Kim
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Monday, Jan 6, 2020
Time: 12:00 PM - 2:00 PM (EST)
Location: Klaus 2100
Committee:
Dr. Hyesoon Kim (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Ada Gavrilovska (School of Computer Science, Georgia Institute of Technology)
Dr. Milos Prvulovic (School of Computer Science, Georgia Institute of Technology)
Dr. Moinuddin Qureshi (School of Electrical and Computer Engineering, Georgia Institute of Technology)
Dr. Vivek Sarkar (School of Computer Science, Georgia Institute of Technology)
Abstract:
Graphics Processing Units (GPUs) have been used successfully for accelerating a wide variety of applications over the last decade. In response to growing compute and memory capacity requirements, modern systems are equipped to distribute the work over multiple GPUs and pool the memory from the host (i.e., system memory) and other GPUs transparently. Compute capacity scales out with multiple GPUs, and the memory capacity afforded by the host is an order of magnitude larger than the GPUs’ device memory. However, both these approaches require data to be migrated over the system interconnect (e.g., PCI-e) during program execution. Since migrating data over the system interconnect takes much longer than a GPU’s internal memory hierarchy, the efficacy of these approaches in achieving high performance is strongly dependent on the data migration overhead. This dissertation proposes several techniques that help mitigate this data migration overhead.
In a system with multiple GPUs, where there is a large discrepancy in access times between local and remote memory accesses, it is crucial to co-locate compute and data to achieve high performance. This thesis discusses how to enable co-location of compute and data in such systems. The proposed mechanism estimates the amount of exclusive data and selectively allocates it in a single GPU while distributing the shared data across multiple GPUs. For this selective coarse-grained allocation, it uses a dual address mode with lightweight changes to virtual to physical page mappings. To place compute in the same GPU as the data it accesses, it uses an affinity-based thread block scheduling policy. This enables efficient use of multiple GPUs while minimizing unnecessary off-chip data migrations.
Support for unified virtual memory and demand paging in modern GPUs provides a coherent view of a single virtual address space between CPUs and GPUs. This allows GPUs to access pages that reside in CPU memory as if they were local to the GPU. This enables GPU applications that are otherwise impossible to run due to memory capacity constraints to run seamlessly. This thesis discusses how to alleviate major inefficiencies that arise in the page fault handling mechanism employed in contemporary GPUs. The proposed mechanism supports a CPU-like thread block context switching to reduce the number of batches (i.e., a group of page faults handled together) and amortize the batch processing overhead. To take page eviction off the critical path, it modifies the runtime software to overlap page evictions with CPU-to-GPU page migrations without requiring any hardware changes.