*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Interpreting Neural Networks for and with Natural Language
Date: Monday, May 9th, 2022
Time: 2:00-4:00pm (ET)
Location (hybrid): CODA C1108 Brookhaven and Zoom
Sarah Wiegreffe
PhD Candidate in Computer Science
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Committee:
Dr. Mark Riedl (advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Alan Ritter (School of Interactive Computing, Georgia Institute of Technology)
Dr. Wei Xu (School of Interactive Computing, Georgia Institute of Technology)
Dr. Noah Smith (Paul G. Allen School of Computer Science & Engineering, University of Washington)
Dr. Sameer Singh (Bren School of Information and Computer Sciences, University of California at Irvine)
Abstract:
In the last decade, real-world applications of NLP technologies have become more widespread and more useful than ever before, in large part thanks to advances in deep learning. The increasing size and nonlinearity of these models results in an opacity that hinders efforts by machine learning practitioners and lay-users alike to understand model internals and derive meaning or trust from their predictions.
The fields of explainable artificial intelligence and more specifically explainable NLP have emerged as an active area for remedying this opacity and for ensuring models' reliability and trustworthiness in high-stakes scenarios. Models that produce justifications can be inspected for the purposes of debugging, quantifying bias and fairness, understanding model behavior, and ascertaining robustness and privacy. Textual explanations, such as highlights and free-text explanations, are uniquely valuable because of the natural communicative affordances language provides over other modalities.
In this dissertation, I propose test suites for evaluating the quality of model explanations under two definitions of meaning: faithfulness and human acceptability. I introduce new ways of evaluating faithfulness of highlight explanations with model-based adversarial search and non-contextual probing models, and of free-text explanations with robustness equivalence and feature importance agreement. I show that a natural language bottleneck increases the likelihood of faithful highlights in neural architectures. I introduce new ways of evaluating human acceptability with crowdsourcing methods inspired by the psychology of explanation, and show that an overgeneration-plus-filtration system improves the acceptability of model-generated free-text explanations. This work strives to increase the likelihood of positive use and outcomes when AI systems are deployed in practice.