*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Runtime Specialization for Heterogeneous CPU-GPU Resources
Naila Farooqui
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: October 19, 2015 (Monday)
Time: 12:00 PM - 2:00 PM (ET)
Location: KACB 3100
Committee:
---------------
Dr. Karsten Schwan (Advisor, School of Computer Science, Georgia Tech)
Dr. Sudhakar Yalamanchili (School of Electrical and Computer Engineering, Georgia Tech)
Dr. Ada Gavrilovska (School of Computer Science, Georgia Tech)
Dr. Richard Vuduc (School of Computational Science and Engineering, Georgia Tech)
Dr. Vanish Talwar (Research Scientist, PernixData)
Dr. Rajkishore Barik (Research Scientist, Intel Labs)
Abstract:
------------
Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a
tantalizing compute fabric for performance-hungry developers. While these platforms
enable order-of-magnitude performance increases for many data-parallel application
domains, there remain several open challenges: (i) the distinct execution models
inherent in the heterogeneous devices present on such platforms drives the need to
dynamically match workload characteristics to the underlying resources, (ii) the complex
architecture and programming models of such systems require substantial application
knowledge and effort-intensive program tuning to achieve high performance, and (iii)
as such platforms become prevalent, there is a need to extend their utility from running
known regular data-parallel applications to the broader set of input-dependent, irregular
applications common in enterprise settings.
The key contribution of our research is to enable runtime specialization on such hybrid
CPU-GPU platforms by matching application characteristics to the underlying heterogeneous
resources for both regular and irregular workloads. Our approach enables profile-driven
resource management and optimizations for such platforms, providing high application
performance and system throughput. Towards this end, this research will: (a) enable dynamic
instrumentation for GPU-based parallel architectures, specifically targeting the complex
Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into
application behavior; (b) leverage such dynamic performance data to support novel online
resource management methods that improve application performance and system throughput,
particularly for irregular, input-dependent applications; (c) automate some of the programmer
effort required to exercise specialized architectural features of such platforms via
instrumentation-driven dynamic code optimizations; and (d) propose a specialized, affinity-aware
work-stealing scheduler for integrated CPU-GPU processors that efficiently distributes work at
runtime across all CPU and GPU cores for improved load balance, taking into account
both application characteristics and architectural differences of the underlying devices.
resources for both regular and irregular workloads. Our approach enables profile-driven
resource management and optimizations for such platforms, providing high application
performance and system throughput. Towards this end, this research will: (a) enable dynamic
instrumentation for GPU-based parallel architectures, specifically targeting the complex
Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into
application behavior; (b) leverage such dynamic performance data to support novel online
resource management methods that improve application performance and system throughput,
particularly for irregular, input-dependent applications; (c) automate some of the programmer
effort required to exercise specialized architectural features of such platforms via
instrumentation-driven dynamic code optimizations; and (d) propose a specialized, affinity-aware
work-stealing scheduler for integrated CPU-GPU processors that efficiently distributes work at
runtime across all CPU and GPU cores for improved load balance, taking into account
both application characteristics and architectural differences of the underlying devices.