PhD Defense by Haoming Jiang

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Thursday April 15, 2021
  12:00 pm - 2:00 pm
Location: Atlanta, GA
Phone:
URL: Bluejeans
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: Reducing Human Labor Cost in Deep Learning for Natural Language Processing

Full Summary: No summary paragraph submitted.

Title: Reducing Human Labor Cost in Deep Learning for Natural Language Processing

Date: Thursday, April 15, 2021

Time: 12:00PM (EST) / 9:00AM (PST)

Location: (BlueJeans meeting link) https://bluejeans.com/206131916

Haoming Jiang

Machine Learning Ph.D. Student

School of Industrial and Systems Engineering
Georgia Institute of Technology

Committee

Dr. Tuo Zhao (Advisor), School of Industrial and Systems Engineering, Georgia Tech

Dr. Weizhu Chen, Microsoft Dynamics 365 AI, Microsoft

Dr. Yao Xie, School of Industrial and Systems Engineering, Georgia Tech

Dr. Diyi Yang, School of Interactive Computing, Georgia Tech

Dr. Chao Zhang, School of Computational Science and Engineering, Georgia Tech

Abstract

Deep learning has fundamentally changed the landscape of natural language processing (NLP). However, training deep learning models requires huge amounts of manually labeled data, which are prohibitive to obtain in some real-world applications. In addition, accurately evaluating models requires human interaction, which is not affordable for large-scale experiments. This dissertation focuses on reducing such human labor costs in deep learning for NLP. Specifically, we develop novel frameworks for training deep learning models with limited/noisy annotation and for estimating human evaluation scores:

Training with Limited Supervision. Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize well to unseen data. To address such an issue, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models via regularized optimization. Our experiments show that the proposed framework achieves new state-of-the-art performance on many NLP tasks.

Training with Weak Supervision. When manually labeled data is not available, we can leverage domain expert knowledge to generate weakly labeled data. The weak supervision, though does not require large amounts of manual annotations, yields highly incomplete and noisy weak labels via external knowledge bases. To address this challenge, we propose a two-stage self-training framework, which leverages the power of pre-trained language models to improve the prediction performance of NLP models. Thorough experiments on benchmark datasets demonstrate the superiority of the proposed framework.

Dialogue Evaluation without Human Interaction. In addition to the model training, we also address the problem of reliable human-free automatic evaluation for dialog systems. An ideal environment for evaluating dialog systems, also known as the Turing Test, needs to involve human interaction, which is usually not affordable for large-scale experiments. To bridge such a gap, we propose a new framework named ENIGMA for estimating human evaluation scores based on recent advances of off-policy evaluation in reinforcement learning. Our experiments show that ENIGMA strongly correlates with human evaluation scores.

Additional Information

In Campus Calendar

Groups

Graduate Studies

Invited Audience

Faculty/Staff, Public, Undergraduate students

Categories

Other/Miscellaneous

Keywords

Phd Defense

Status

Created By: Tatianna Richardson
Workflow Status: Published
Created On: Apr 5, 2021 - 2:58pm
Last Updated: Apr 5, 2021 - 2:58pm