*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Hallucinating agent experience to speed up reinforcement learning
Himanshu Sahni
Ph.D. student in Computer Science
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Tuesday, March 17, 2020
Time: 12:45pm-2:30 PM EST
Location: https://bluejeans.com/536486204
Meeting ID: 536 486 204
**Note: this proposal is remote-only due to the institute's guidelines on COVID-19**
---
Committee:
Dr. Charles Isbell (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology
Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology
Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology
---
Summary:
Reinforcement learning has seen widespread success recently. Yet, training RL agents remains prohibitively expensive in terms of number of environment interactions. The overall aim of this research is to significantly reduce sample complexity required for training RL agents, making it easier to deploy them in the real world and quickly learn from experience. This proposal focuses on learning how to alter experience collected by the agent during exploration, rather than the learning algorithm itself. We define realistic alterations, those permitted by the environment state space and dynamics, to the trajectory of an agent as hallucinations. I will demonstrate that by presenting hallucinated data to off-the-shelf RL algorithms, we can significantly improve their sample efficiency.
As contributions, I will outline three ways of altering agent experience to benefit learning. The first uses hallucinations to train a representation of the state of the environment when the agent has a limited field of view. Key components of this system are a short term memory architecture for such environments and an adversarially trained attention controller. The second contribution is a method to alter visual trajectories in hindsight using learned hallucinations of goal images. Combined with Hindsight Experience Replay, this significantly speeds up reinforcement learning as shown in two navigation based domains. The third proposed contribution outlines how to hallucinate realistic subgoals using state-based value functions.
The contributions above serve to support the thesis statement: We can alter the distribution of an agent's future experiences by