Ph.D. Dissertation Defense - Foroozan Karimzadeh

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Monday December 12, 2022
      9:00 am - 11:00 am
  • Location: https://teams.microsoft.com/l/meetup-join/19%3ameeting_ZWM5NGRlOTMtYWU4OS00ODMzLTgzZjgtYmIwMDI0NjgzY2Q0%40thread.v2/0?context=%7b%22Tid%22%3a%22482198bb-ae7b-4b25-8b7a-6d7f32faa083%22%2c%22Oid%22%3a%22c202fd28-132c-4b10-af4c-2a2381dbdec3%22%7d
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Hardware-Friendly Model Compression for Deep Learning Accelerators

Full Summary: No summary paragraph submitted.

TitleHardware-Friendly Model Compression for Deep Learning Accelerators

Committee:

Dr. Arijit Raychowdhury, ECE, Chair, Advisor

Dr. Justin Romberg, ECE

Dr. Shimeng Yu, ECE

Dr. Asif Khan, ECE

Dr. Yingyan Lin, CS

Abstract: The objective of the proposed research is to introduce solutions to make energy-efficient Deep Neural Network (DNN) algorithms to be deployable on edge devices through developing hardware-aware DNN compression methods. The rising popularity of intelligent mobile devices and the computational cost of deep learning-based models call for efficient and accurate on-device inference schemes. In particular we proposed four compression techniques. In the first method, LGPS, we present a hardware-aware pruning method where the locations of non-zero weights are derived in real-time from a LFSR. Using the proposed method, we demonstrate a total saving of energy and area up to 63.96% and 64.23% for VGG-16 network on down-sampled ImageNet, respectively for iso-compression rate and iso-accuracy. Secondly, we propose a novel model compression scheme that allows inference to be carried out using bit-level sparsity, which can be efficiently implemented using in-memory computing macros.We introduce a method called BitS-Net to leverage the benefits of bit-sparsity (where the number of zeros is more than number of ones in binary representation of weight/activation values) when applied to Compute-In-Memory (CIM) with Resistive Random Access Memory (RRAM) to develop energy efficient DNN accelerators operating in the inference mode. We demonstrate that BitS-Net improves the energy efficiency by up to 5x for ResNet models on the ImageNet dataset. We also explored the deep learning quantization by developing knowledge distillation and gradual quantization for pruned network. Finally, to achieve highly energy-efficient DNN, we introduce a novel twofold sparsity method to sparsify the DNN models in bit- and network-level, simultaneously. We use two separate regularizations to be added to the loss function in order to achieve bit- and network-level sparsity at the same time. We have shown that by using our proposed method we are able to sparsify the network and design a highly energy-efficient deep learning accelerator to eventually bring artificial intelligence (AI) to our daily lives.

Additional Information

In Campus Calendar
No
Groups

ECE Ph.D. Dissertation Defenses

Invited Audience
Public
Categories
Other/Miscellaneous
Keywords
Phd Defense, graduate students
Status
  • Created By: Daniela Staiculescu
  • Workflow Status: Published
  • Created On: Dec 5, 2022 - 4:37pm
  • Last Updated: Dec 5, 2022 - 4:37pm