*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Task Generalized MDPs for Multi-Task Reinforcement Learning
Date: December 8th, 2021 (Wednesday)
Time: 3:00 - 4:30 pm Eastern Time (12:00-1:30 PM Pacific Time)
Location: Coda C1115 Druid Hills and https://bluejeans.com/556574054/8997
Himanshu Sahni
Computer Science PhD Candidate
School of Interactive Computing
Georgia Institute of Technology
Committee
1. Dr. Charles Isbell (Advisor), School of Interactive Computing, John P. Imlay, Jr. Dean of the College of Computing, Georgia Institute of Technology
2. Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology
3. Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology
4. Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology
5. Dr. Volodymyr Mnih, DeepMind
Abstract
Reinforcement learning (RL) has seen widespread success in creating intelligent agents in several challenging domains. Yet, training RL agents remains prohibitively expensive in terms of the number of environment interactions required. One of the reasons for this inefficiency is that every new task is usually learned from scratch, instead of leveraging information from similar tasks.
In this talk, I will describe task-generalized Markov Decision Processes which are built from a distribution of tasks, or MDPs that differ only in their reward functions. This thesis demonstrates that task-generalized MDPs can provide significant speedups for reinforcement learning in multi-task settings. Specifically, I claim that by first building a task-generalized MDP from a set of training tasks, one can achieve significant speedups on later tasks drawn from the set.
There are three key contributions made in this work:
1. I introduce the idea of combining attention, short term memory and unsupervised rewards to build a state representation in a limited field of view environment. By altering the underlying MDP's state space, we can enable reinforcement learning of tasks within it.
2. HALGAN, which inserts realistic goals retroactively into desired locations along the agent's trajectory while respecting the environment dynamics. This work extends the idea of Hindsight Experience Replay to visual environments thereby speeding up reinforcement learning in them.
3. A framework for task distribution biased unsupervised reinforcement learning. This framework allows for learning skills that are biased towards a task distribution and simultaneously distinct from one another. Skills learnt in this manner generalize better to downstream tasks compared against skill learning methods that do not incorporate this bias.