*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Task-Dependent Models for Reinforcement Learning
Date: Monday, October 25th, 2021
Time: 12:00 PM - 2:00 PM
Location (virtual): https://bluejeans.com/264974579/4014
Nirbhay Modhe
PhD Student in Computer Science
College of Computing
Georgia Institute of Technology
Committee
Dr. Dhruv Batra (Advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Dr. Ashwin Kalyan (Allen Institute for AI)
Dr. Dipendra Misra (Microsoft Research)
Abstract
Model-based reinforcement learning (RL) is the field that lies at the intersection of planning and learning for sequential decision making in Markov Decision Processes (MDPs). Model-based RL has gained popularity due to its many potential benefits such as sample/data efficiency, optimization stability and targeted exploration. However, most of the research progress in model-based RL has persisted in the use of maximum-likelihood estimation for learning a correct dynamics model of future state transitions in MDPs -- an objective that does not align with the down-stream task of using the model to learn an approximately optimal control policy.
In this thesis, we push the boundaries of task-dependent model learning -- where the model learning objective aligns with the control objective of learning a policy -- and its applications in model-based reinforcement learning for continuous control. We present (1) a novel value-aware model learning objective derived by upper bounding the model-performance difference -- the difference in performance of a policy across two MDPs that differ in their transition dynamics and reward distributions. We study the relationship between model performance difference, generalization gap and optimality gap in reinforcement learning and find that even a sub-optimal policy is good enough to rank and select a good model from a list of candidate models that approximate the target MDP. Next, (2) we present an algorithm that deploys our proposed as well as existing value-aware model learning objectives in a model-based reinforcement learning problem setup, demonstrating the first practically significant performance in challenging continuous control simulation tasks, exceeding the performance and sample efficiency of maximum-likelihood estimation. In the proposed work, we aim to expand our task-dependent model learning framework to incorporate intelligent exploration techniques to further improve sample efficiency in model-based reinforcement learning.