*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Human-guided Exploration for Efficient Reinforcement Learning
Kaushik Subramanian
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
www.cc.gatech.edu/~ksubrama
Date: Monday, April 27th, 2015
Time: 10 AM to 12 NOON ET
Location: CCB 345
Committee
--------------
Dr. Charles Isbell (School of Interactive Computing, Georgia Institute of Technology)
Dr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Dr. Thad Starner (School of Interactive Computing, Georgia Institute of Technology)
Dr. Peter Stone (Department of Computer Science, University of Texas at Austin)
Abstract
--------------
Reinforcement Learning (RL) is the field of research focused on solving sequential decision-making tasks modeled as Markov Decision Processes. Researchers have shown RL to be successful at solving a variety of problems like system operations (logistics), robot tasks (soccer, helicopter control) and computer games (backgammon); however, in general, it is well-known that standard RL approaches do not scale well with the size of the problem. The reason this problem arises is that RL approaches rely on obtaining samples useful for learning the underlying structure. In this work we tackle the problem of smart exploration in RL, by using human interaction. We propose policy-based methods that serve to 1) effectively bias exploration towards important aspects of the domain and 2) balance the exploration-exploitation trade-off.
We propose a policy-based approach called Exploration from Demonstration (EfD) that uses an exploration policy learned from human demonstrations to provide performance speed-ups. We also show how we can obtain useful samples for EfD using concepts of Active Learning. We then present an approach that makes use of some of the inherent structure in the exploratory human demonstrations to assist Monte Carlo RL to overcome its limitations and efficiently solve large-scale problems. We then tackle the problem of balancing the exploration-exploitation trade-off in RL. We present a probabilistic method called Policy Shaping which combines human evaluations with Bayesian RL. We show how this approach provides performance speedups while being robust to noisy, suboptimal human signals. We implement our methods on popular arcade games and highlight the improvements achieved using our approach. We show how the proposed work on using humans to help agents efficiently explore sequential decision-making tasks is an important and necessary step in applying Reinforcement Learning to complex problems.