*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Joint ECE and SCS Faculty Candidate Talk: John A. Stratton, University of Illinois Urbana-Champaign
Title: Performance Portability Across GPU and CPU Architectures
Abstract
Heterogeneous architectures, by definition, include multiple processing components with very different microarchitectures and execution models. In particular, computing platforms from supercomputers to smartphones can now incorporate both CPU and GPU processors. Disparities between CPU and GPU processor architectures have naturally led to distinct programming models and development patterns for each component. Developers for a specific system decompose their application, assign different parts to different heterogeneous components, and express each part in its assigned component's native model. But without additional effort, that application will not be suitable for another architecture with a different heterogeneous component balance. Developers addressing a variety of platforms must either write multiple implementations for every potential heterogeneous component or fall back to a ``safe'' CPU implementation, incurring a high development cost or loss of system performance, respectively. The disadvantages of developing for heterogeneous systems are vastly reduced if one source code implementation can be mapped to either a CPU or GPU architecture with high performance.
A convention has emerged from the OpenCL community defining how to write kernels for performance portability among different GPU architectures. Because of this well-defined convention, OpenCL programs written according to this convention contain enough abstract performance information to enable effective translations to CPU architectures as well. The challenge is that an OpenCL implementation must focus on those programming conventions more than the most natural mapping of the language specification to the target architecture. I will outline some concrete transformations that can be applied to an OpenCL kernel to suitably map the abstract performance properties to CPU execution constructs. Such transformations result in marked performance improvements over existing CPU OpenCL implementations for GPU-portable OpenCL kernels. Ultimately, the performance of GPU-portable OpenCL kernels, when using this methodology, is comparable to the performance of native multicore CPU programming models such as OpenMP.
Bio
John Stratton recently completed his Ph.D. at the University of Illinois Urbana-Champaign, where he is currently a Visiting Lecturer in addition to his role as Senior Architect at MulticoreWare Incorporated. He has been teaching GPU computing since the first university course on the subject in Spring 2007. John has also been working closely with the SPEC corporation to publish the first industry-standard accelerator computing benchmark suite due to be released later this year. He has received several awards for outstanding research, teaching, and technology development, most recently given the “Most Valuable Entrepreneurial Leadership in a Startup” award by the University of Illinois Research Park for his work with MulticoreWare.