*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Development of a Neural Network-based Speech Enhancement System
Committee:
Dr. David Anderson, ECE, Chair , Advisor
Dr. Aaron Lanterman, ECE
Dr. Elliott Moore, ECE
Dr. Pamela Bhatti, ECE
Dr. Branislov Vidakovic, ISyE
Abstract:
Neural networks are powerful machine learning models that have, in the last few years, been applied to several audio and speech signal processing problems including speech enhancement. Although, neural network-based speech enhancement approaches have out-performed traditional model-based approaches, there remain several unanswered questions such as the most suitable network architectures, input features, training targets, and best practices for obtaining optimal results. This dissertation studies two approaches to the development of a neural network-based speech enhancement system. First, we investigate the use of the extreme learning machine, an algorithm that allows feed-forward networks to be trained quickly and provides good generalization, for speech enhancement. We then propose modifications to the extreme learning machine to increase its prediction accuracy on multivariate datasets and demonstrate the improved performance of these algorithms on several real-world datasets and in the enhancement of noisy speech. Next, with a view to obtaining improved low-SNR performance, we develop a noise prediction and time domain subtraction framework for speech enhancement. We extend the development of the noise prediction framework by investigating different training targets and the use of noise-aware training methods and show using objective performance metrics that the proposed framework compares favorably with conventional speech prediction approaches in enhancing speech quality and intelligibility in both seen and unseen noise conditions.