PhD Proposal by Nirbhay Modhe

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Monday October 25, 2021
      12:00 pm - 2:00 pm
  • Location: Atlanta, GA; REMOTE
  • Phone:
  • URL: Bluejeans
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: Task-Dependent Models for Reinforcement Learning

Full Summary: No summary paragraph submitted.

Title: Task-Dependent Models for Reinforcement Learning

Date: Monday, October 25th, 2021

Time: 12:00 PM - 2:00 PM

Location (virtual): https://bluejeans.com/264974579/4014

 

Nirbhay Modhe

PhD Student in Computer Science

College of Computing

Georgia Institute of Technology

 

Committee

Dr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)

Dr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)

Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)

Dr. Ashwin Kalyan (Allen Institute for AI)

Dr. Dipendra Misra (Microsoft Research)

 

Abstract

Model-based reinforcement learning (RL) is the field that lies at the intersection of planning and learning for sequential decision making in Markov Decision Processes (MDPs). Model-based RL has gained popularity due to its many potential benefits such as sample/data efficiency, optimization stability and targeted exploration. However, most of the research progress in model-based RL has persisted in the use of maximum-likelihood estimation for learning a correct dynamics model of future state transitions in MDPs -- an objective that does not align with the down-stream task of using the model to learn an approximately optimal control policy.

 

In this thesis, we push the boundaries of task-dependent model learning -- where the model learning objective aligns with the control objective of learning a policy -- and its applications in model-based reinforcement learning for continuous control. We present (1) a novel value-aware model learning objective derived by upper bounding the model-performance difference -- the difference in performance of a policy across two MDPs that differ in their transition dynamics and reward distributions. We study the relationship between model performance difference, generalization gap and optimality gap in reinforcement learning and find that even a sub-optimal policy is good enough to rank and select a good model from a list of candidate models that approximate the target MDP. Next, (2) we present an algorithm that deploys our proposed as well as existing value-aware model learning objectives in a model-based reinforcement learning problem setup, demonstrating the first practically significant performance in challenging continuous control simulation tasks, exceeding the performance and sample efficiency of maximum-likelihood estimation. In the proposed work, we aim to expand our task-dependent model learning framework to incorporate intelligent exploration techniques to further improve sample efficiency in model-based reinforcement learning.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd proposal
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Oct 21, 2021 - 10:26am
  • Last Updated: Oct 21, 2021 - 10:26am