PhD Defense by Yuval Pinter

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Thursday May 27, 2021
      11:30 am - 1:30 pm
  • Location: Atlanta, GA; REMOTE
  • Phone:
  • URL: Bluejeans
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

Full Summary: No summary paragraph submitted.

Title: Integrating Distributional, Compositional, and Relational Approaches to Neural Word Representations

 

Yuval Pinter

Ph.D. Candidate

School of Interactive Computing

Georgia Institute of Technology

www.yuvalpinter.com

 

Date: Thursday, May 27th, 2021

Time: 11:30 am to 2:00 pm (EDT)

Location: https://bluejeans.com/865539108

 

Committee:

 

Dr. Jacob Eisenstein (advisor), School of Interactive Computing, Georgia Institute of Technology

Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

Dr. Wei Xu, School of Interactive Computing, Georgia Institute of Technology

Dr. Diyi Yang, School of Interactive Computing, Georgia Institute of Technology

Dr. Dan Roth, Department of Computer and Information Science, University of Pennsylvania

 

 

Abstract:

 

When the field of natural language processing (NLP) entered the era of deep neural networks, the task of representing basic units of language using low-dimensional real vectors, or embeddings, became crucial. The dominant technique to perform this task has for years been to segment input text sequences into space-delimited words, for which embeddings are trained over a large corpus by means of leveraging distributional information: a word is reducible to the set of contexts it appears in.

This approach is powerful but imperfect; words not seen during the embedding learning phase, known as out-of-vocabulary words (OOVs), emerge in any plausible application where embeddings are used. One approach applied in order to combat this and other shortcomings is the incorporation of compositional information obtained from the surface form of words, enabling the representation of morphological regularities and increasing robustness to typographical errors. Another approach leverages word-sense information and relations curated in large semantic graph resources, offering a supervised signal for embedding space structure and improving representations for domain-specific rare words.

 

In this dissertation, I offer several analyses and remedies for the OOV problem based on the utilization of character-level compositional information in multiple languages and the structure of semantic knowledge in English. In addition, I provide two novel datasets for the continued exploration of vocabulary expansion in English: one with a taxonomic emphasis on novel word formation, and the other generated by a real-world data-driven use case in the entity graph domain. Finally, recognizing the shift in NLP towards context-dependent representations of subword tokens, I describe the form in which the OOV problem still appears in these methods, and propose a new integrative compositional model to address it.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Graduate students, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: May 13, 2021 - 9:29am
  • Last Updated: May 13, 2021 - 9:29am