*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Time and Date: Nov 5th (Fri.) - EST 15:00 to to 16:00 (PDT 12:00 to 13:00)
BlueJeans link: (bluejeans.com/4658604304)
Speaker: Shinji Watanabe, Carnegie Mellon University (CMU)
Title: Multi-Speaker Conversation Recognition based on End-to-End Neural Networks
Abstract: Recently, the end-to-end automatic speech recognition (ASR) paradigm has attracted great research interest as an alternative to the conventional hybrid framework of deep neural networks and hidden Markov models. This talk introduces extensions of the basic end-to-end architecture to tackle major problems faced by current ASR technologies in adverse environments including distant-talk and multi-speaker conditions. First, we propose a unified architecture to encompass microphone-array signal processing such as a state-of-the-art neural beamformer and dereverberation within the end-to-end framework. This architecture allows speech enhancement and ASR components to be jointly optimized to improve the ASR objective and leads to an end-to-end framework that works well in the distant-talk scenario. Next, we extend the framework to deal with multi-speaker ASR, where the system directly decodes multiple label sequences from a single speech sequence by unifying source separation and speech recognition functions in an end-to-end manner. Finally, we will introduce our open source activities, called ESPnet (https://github.com/espnet/espnet), which can reproduce various speech processing experiments including the above example.
Bio: Shinji Watanabe is an Associate Professor at Carnegie Mellon University, Pittsburgh, PA. He received his B.S., M.S., and Ph.D. (Dr. Eng.) degrees from Waseda University, Tokyo, Japan. He was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan, from 2001 to 2011, a visiting scholar in Georgia institute of technology, Atlanta, GA in 2009, and a senior principal research scientist at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA USA from 2012 to 2017. Prior to the move to Carnegie Mellon University, he was an associate research professor at Johns Hopkins University, Baltimore, MD USA from 2017 to 2020. His research interests include automatic speech recognition, speech enhancement, spoken language understanding, and machine learning for speech and language processing. He has been published more than 300 papers in peer-reviewed journals and conferences and received several awards, including the best paper award from the IEEE ASRU in 2019. He served as an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing. He was/has been a member of several technical committees, including the APSIPA Speech, Language, and Audio Technical Committee (SLA), IEEE Signal Processing Society Speech and Language Technical Committee (SLTC), and Machine Learning for Signal Processing Technical Committee (MLSP).