PhD Defense by Himanshu Sahni

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Wednesday December 8, 2021
      3:00 pm - 4:30 pm
  • Location: Coda C1115 Druid Hills
  • Phone:
  • URL: Bluejeans
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Task Generalized MDPs for Multi-Task Reinforcement Learning

Full Summary: No summary paragraph submitted.

Title: Task Generalized MDPs for Multi-Task Reinforcement Learning

 

Date: December 8th, 2021 (Wednesday)

Time: 3:00 - 4:30 pm Eastern Time (12:00-1:30 PM Pacific Time)

Location: Coda C1115 Druid Hills and https://bluejeans.com/556574054/8997

 

Himanshu Sahni

Computer Science PhD Candidate

School of Interactive Computing
Georgia Institute of Technology

 

Committee

1. Dr. Charles Isbell (Advisor), School of Interactive Computing, John P. Imlay, Jr. Dean of the College of Computing, Georgia Institute of Technology

2. Dr. Judy Hoffman, School of Interactive Computing, Georgia Institute of Technology

3. Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology

4. Dr. Dhruv Batra, School of Interactive Computing, Georgia Institute of Technology

5. Dr. Volodymyr Mnih, DeepMind

 

Abstract

 

Reinforcement learning (RL) has seen widespread success in creating intelligent agents in several challenging domains. Yet, training RL agents remains prohibitively expensive in terms of the number of environment interactions required. One of the reasons for this inefficiency is that every new task is usually learned from scratch, instead of leveraging information from similar tasks.

 

In this talk, I will describe task-generalized Markov Decision Processes which are built from a distribution of tasks, or MDPs that differ only in their reward functions. This thesis demonstrates that task-generalized MDPs can provide significant speedups for reinforcement learning in multi-task settings. Specifically, I claim that by first building a task-generalized MDP from a set of training tasks, one can achieve significant speedups on later tasks drawn from the set.

 

There are three key contributions made in this work:

 

1. I introduce the idea of combining attention, short term memory and unsupervised rewards to build a state representation in a limited field of view environment. By altering the underlying MDP's state space, we can enable reinforcement learning of tasks within it.

 

2. HALGAN, which inserts realistic goals retroactively into desired locations along the agent's trajectory while respecting the environment dynamics. This work extends the idea of Hindsight Experience Replay to visual environments thereby speeding up reinforcement learning in them.

 

3. A framework for task distribution biased unsupervised reinforcement learning. This framework allows for learning skills that are biased towards a task distribution and simultaneously distinct from one another. Skills learnt in this manner generalize better to downstream tasks compared against skill learning methods that do not incorporate this bias.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Dec 2, 2021 - 8:11am
  • Last Updated: Dec 2, 2021 - 8:11am