*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Ph.D. Dissertation Defense Announcement
Title: Utilizing Negative Policy Information to Accelerate Reinforcement Learning
Arya Irani
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Monday, November 10, 2014
Time: 12:30pm - 2:30pm EST
Location: CCB 345
Committee:
Dr. Charles Isbell (Advisor; School of Interactive Computing, Georgia Institute of Technology)
Dr. Andrea Thomaz (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Dr. Karen Feigh (School of Aerospace Engineering, Georgia Institute of Technology)
Dr. Doina Precup (School of Computer Science, McGill University)
Abstract:
A pilot study on Markov Decision Problem (MDP) task decomposition by humans revealed that participants would break down tasks into both short-term subgoals (with a defined end-condition), and long-term considerations and invariants (no end-condition). In the context of MDPs, behaviors having clear start and end conditions are well-modeled by options (Precup, 2000), but no abstraction exists in the literature for continuous requirements imposed on the agent's behavior. By modeling such policy restrictions and incorporating this information into an agent’s exploration, learning speedup can be achieved. Two proposed representations for such continuous requirements are the state constraint (a set or predicate identifying states that the agent should avoid), and the state-action constraint (identifying state-action pairs that should not be taken).
We will demonstrate that the composition of options with constraints forms a powerful combination — a naïve option designed to perform well in a best-case scenario may still be used to benefit in domains where the best-case scenario is not guaranteed. This separation of concerns simplifies design and learning. We present the results of a study focusing on two classic video game inspired domains, in which participants with no AI experience construct and record examples of states to avoid; the examples are used to train predictors which implement a state constraint. We also demonstrate that constraints can in many cases be formulated by software engineers and given as modules to the RL system, eliminating one machine learning layer. We will discuss schemes for overcoming imperfectly defined constraints that would prevent an optimal policy, considerations in creating domain-appropriate schemes, as well as several future directions.