*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Manipulating State Space Distributions for Sample-Efficient Imitation- and Reinforcement Learning
Yannick Schroecker
Ph.D. student in Computer Science
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Friday, February 8, 2019
Time: 11:00am-1:00 PM EST
Location: CCB 347
---
Committee:
Dr. Charles Isbell (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. Sonia Chernova, School of Interactive Computing, Georgia Institute of Technology
Dr. Byron Boots, School of Interactive Computing, Georgia Institute of Technology
Dr. Irfan Essa, School of Interactive Computing, Georgia Institute of Technology
Dr. Nando de Freitas, Department of Computer Science, University of Oxford and Google DeepMind
---
Summary:
Imitation learning has emerged as one of the most effective approaches to train agents to act intelligently in unstructured and unknown domains. On its own or in combination with reinforcement learning, it enables agents to copy the expert's behavior and to solve complex, long-term decision making problems. However, to utilize demonstrations effectively and learn from a finite amount of data, the agent needs to develop an understanding of the environment. This thesis investigates estimators of the state-distribution gradient as a means to influence which states the agent will see and thereby guide it to imitate the expert's behavior. Furthermore, this thesis will show that approaches which reason over future states in this way are able to learn from sparse signals and thus provide a way to effectively program agents. Specifically, this talk proposes to validate the following thesis statement:
Exploiting inherent structure in Markov chain stationary distributions allows learning agents to reason about likely future observations and enables robust and interactive imitation learning, providing an efficient way to teach agents and providing guidance for composite learning systems.