*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Building Efficient Tensor Accelerators for Sparse and Irregular Workloads
Committee:
Dr. Tushar Krishna, ECE, Chair, Advisor
Dr. Hyesoon Kim, ECE
Dr. Richard Vuduc, CoC
Dr. Sivasankaran Rajamanickam, Sandia National Labs
Dr. Callie Hao, ECE
Abstract: Popular Machine Learning (ML) and High Performance Computing (HPC) workloads contribute to a significant portion of runtime on data centers. Applications include image classification, speech recognition, recommendation systems, social network analysis, robotic problems, chemical process simulations, and so on. Recently due to large computational demands from emerging workloads, there is a surge of custom hardware accelerator development for computing tensor kernels with high performance and energy efficiency. For example, the Google Tensor Processing Unit (TPU) is a custom hardware accelerator targeting efficient matrix multiplications for Deep Neural Networks (DNNs). However, there are limitations with state-of-the-art accelerators, stemming from (1) a vast spectrum of sparsity across various workloads and (2) irregularity of tensor dimensions (e.g. tall-skinny matrices). This thesis explores novel methodologies and architectures for building efficient accelerators for sparse tensor algebra. The first major contribution of this thesis is the proposal of using specialized on-chip interconnects to provide flexible computational mappings of sparse and irregular matrices onto processing elements (PEs). With the proposed specialized interconnects, this thesis presents a new sparse DNN accelerator targeting workloads with 30% to 100% density (percentage of nonzeros) named SIGMA. Unlike popular DNNs, HPC workloads utilize tensors spanning from 10^-6% dense to fully dense. The second major contribution of this thesis explores the system impact of utilizing various compression formats across all sparsity regions. This thesis proposes a predictor to determine the the best compression format combination and a custom hardware compression format converter named MINT. Together, they provide significant energy-delay product (EDP) improvement over state-of-the-art accelerators. The third major contribution of this thesis analyzes popular state-of-the-art sparse accelerators using a new tool named Hard TACO. The impact of Hard TACO is that it allows realistic architectural exploration of homogeneous and heterogeneous accelerators.