*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Leveraging Value-awareness for Online and Offline Model-based Reinforcement Learning
Date: Thursday, October 27th, 2022
Time: 9:00 AM - 11:00 AM Eastern Time
Location (virtual): https://bluejeans.com/264974579/4014
Nirbhay Modhe
Ph.D. Candidate
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Committee
Dr. Dhruv Batra (advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. Zsolt Kira, School of Interactive Computing, Georgia Institute of Technology
Dr. Mark Riedl, School of Interactive Computing, Georgia Institute of Technology
Dr. Gaurav Sukhatme, University of Southern California
Dr. Ashwin Kalyan, Allen Institute for AI (AI2)
Summary
Model-based Reinforcement Learning (RL) lies at the intersection of planning and learning for sequential decision making. Value-awareness in model learning has recently emerged as a means to imbue task or reward information into the objective of model learning, in order for the model to leverage specificity of a task. While finding success in theory as being superior to maximum likelihood estimation in the context of (online) model-based RL, value-awareness has remained impractical for most non-trivial tasks.
This thesis aims to bridge the gap in theory and practice by applying the principle of value-awareness to two settings -- the online RL setting and offline RL setting. First, within online RL, this thesis revisits value-aware model learning from the perspective of minimizing performance difference, obtaining a novel value-aware model learning objective as a direct upper bound of it. Then, this thesis investigates and remedies the issue of stale value estimates that has so far been holding back the practicality of value-aware model learning. Using the proposed remedy, performance improvements are presented over maximum-likelihood based baselines and existing value-aware objectives, in several continuous control tasks, while also enabling existing value-aware objectives to become performant.
In the offline RL setting, this thesis takes a step back from model learning and applies value-awareness towards better data augmentation. Such data augmentation, when applied to model-based offline RL algorithms, allows for leveraging unseen states with low epistemic uncertainty that have previously not been reachable within the assumptions and limitations of model-based offline RL. Value-aware state augmentations are found to enable better performance on offline RL benchmarks compared to existing baselines and non-value-aware alternatives.