*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Diagnosing performance limitations in HPC applications
Kenneth Czechowski
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
Date: Monday, December 14, 2015
Time: 3:00PM - 4:30PM EST
Location: Klaus 2100
Committee
---------
Richard Vuduc (Advisor, School of Computational Science and Engineering, Georgia Tech)
Edmond Chow (School of Computational Science and Engineering, Georgia Tech)
Hyesoon Kim (School of Computer Science, Georgia Tech)
Sudhakar Yalamanchili (School of Electrical and Computer Engineering, Georgia Tech)
Victor W. Lee (Parallel Computing Lab, Intel)
Abstract:
One of the most challenging aspects of High Performance Computing is diagnosing performance limitations for the purposes of improving application performance. In many cases, identifying bottlenecks, such as latency stalls, requires a level of fidelity beyond that of traditional performance models and runtime analysis, yet architectural simulations are too cumbersome and are focused on the design-space of the hardware rather than tuning software. Instead, we propose an autotuning-inspired technique, called Pressure Point Analysis (PPA), for performance analysis which delivers the accessibility of high-level analytical models with the precision of a simulator based approach. The foundation of this approach is based on a technique that dynamically perturbs binary code (e.g., inserting/deleting instructions to affect utilization of functional units, altering memory access addresses to change cache hit rate, or swapping registers to alter instruction level dependencies) to then analyze the effects various perturbations have on the overall performance. When carried out in a principled manner, a battery of carefully designed perturbations, which target specific microarchitectural features, can glean valuable insight about pressure points in the code. The intent is to provide actionable information about hardware-software interactions that can be used by the software developer to manually tweak the application. In some circumstances the performance bottlenecks are unavoidable, in which case this analysis can be used to establish a rigorous performance bound for the application. In other cases, this information can identify the primary performance limitations and project potential performance improvements if these bottlenecks are mitigated.
This dissertation argues that automated code perturbations can be used to (1) diagnose performance limitations of arbitrary HPC applications, (2) be used to build rigorous performance bounds, and (3) identify microarchitectural weaknesses. Ultimately, this work contributes the understanding of hardware-software codesign and improves the productivity of the software optimization process.