Ph.D. Proposal Oral Exam - Zhong Meng

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Tuesday October 17, 2017 - Wednesday October 18, 2017
      10:00 am - 11:59 am
  • Location: Room 5126, Centergy
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Discriminative and Adaptive Training od Deep Models for Robust Speech Recognition and Understanding

Full Summary: No summary paragraph submitted.

Title:  Discriminative and Adaptive Training od Deep Models for Robust Speech Recognition and Understanding

Committee: 

Dr. Juang, Advisor       

Dr. Lee, Chair

Dr. Moore

Abstract:

The objective of the proposed research is to build a robust speech recognition and understanding system through discriminative and adaptive training of the deep acoustic models. The goal is achieved through the following approaches: 1. To achieve accurate keyword spotting on conversational speech, the non-uniform error cost minimum classification error objective is used to discriminatively train the bi-direction long short-term memory (BLSTM)-recurrent neural network (RNN) acoustic model so that the errors of only keywords are minimized.  2. To generate semantically accurate word lattices for topic spotting, minimum semantic error cost objective function is proposed to train the BLSTM-RNN acoustic model, in which the expected semantic error cost of all possible word sequences on the lattices is minimized given the reference.  3. To cope with the mismatched training and test conditions in automatic speech recognition (ASR), domain separation networks are used for the unsupervised adaptation of the deep neural network acoustic models through adversarial multi-task training.  4. To achieve robust far-field ASR, beamforming is performed over speech signal acquired from multiple microphones. An LSTM-RNN is used to adaptively estimate the real-time beamforming filter coefficients to cope with non-stationary environmental noise and dynamic nature of source and microphones positions. The adaptive LSTM beamformer is jointly trained with a deep LSTM acoustic model to predict senone (tri-phone state) labels.

Additional Information

In Campus Calendar
No
Groups

ECE Ph.D. Proposal Oral Exams

Invited Audience
Public
Categories
Other/Miscellaneous
Keywords
Phd proposal, graduate students
Status
  • Created By: Daniela Staiculescu
  • Workflow Status: Published
  • Created On: Oct 6, 2017 - 11:15am
  • Last Updated: Oct 6, 2017 - 1:59pm