*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Ph.D. Thesis Proposal Announcement
Title: Semantic Representation Learning for Discourse Processing
Yangfeng Ji
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
http://jiyfeng.github.io/
Date: April 7, 2015 (Tuesday)
Time: 1:00PM – 3:00PM EDT
Location: Klaus 1212
Committee
Dr. Jacob Eisenstein (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology
Dr. Byron Boots, School of Interactive Computing, Georgia Institute of Technology
Abstract:
Discourse information is about how coherent texts are structured, and how the sentences in texts are connected with discourse relations. In natural language processing (NLP), discourse information could help NLP systems to do better jobs, for example, getting more accurate results on sentiment analysis or making machine-translated texts more fluent. However, automatically exacting discourse information is difficult, because it requires semantic information from texts. The existing representation methods with surface features are too shallow to capture enough information for processing discourse.
The goal of my work is to improve the performance of discourse processing with representation learning. Instead of employing some hand-crafted surface features, I propose to learn a representation function for extracting information from texts. With supervision signals from discourse annotation, the representation function is able to learn the semantic information automatically. In this proposal, I present three representation functions with different complexities: (i) a linear function, (ii) an upward composition function with syntactic structures, and (iii) a downward composition function with the identified entities shared in texts. With representation learning, the performance of discourse processing is improved on several different tasks, including discourse parsing on the RST Discourse Treebank and implicit discourse relation identification on the Penn Discourse Treebank. Furthermore, to minimize the requirement of annotated data for representation learning, I extend the framework by introducing some learning methods with distant supervision. In addition, I also discuss two applications, sentiment analysis and discourse-aware machine translation, with the state-of-the-art discourse processing system from my completed work.