*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Managing Transient Reliability and Performance in GPU Applications
Committee:
Dr. Yalamanchili, Advisor
Dr. Wills, Chair
Dr. Kim
Abstract: The objective of the proposed research is to develop a framework for software-based, low-cost error detection for GPU applications that can adapt to dynamic changes in kernel resilience characteristics as well as environmental reliability factors. The proposed research consists of an adaptive, software reliability enhancement (SRE) framework, a dynamic reliability management (DRM) that leverages SRE framework to control trade offs between performance and reliability, and an SRE technique tailored to the unique properties of GPU execution. By incorporating the variation in reliability requirements, applications can reach the same level of resilience with lower overhead than any one technique.