*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Perceptual Goal Specifications for Reinforcement Learning
Date:
Wednesday, November 22nd, 2017
Time:
1:00pm - 3:00pm (EST)
Location:
CCB 153
Ashley Edwards
Ph.D.
Student
School of
Interactive Computing
College
of Computing
Georgia
Institute of Technology
Committee:
---------------
Dr. Charles Isbell (Advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Tucker Balch (School of Interactive Computing, Georgia Institute of Technology)
Dr. Sonia Chernova (School of Interactive Computing, Georgia Institute of Technology)
Dr. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Abstract:
---------------
Rewards often act as the sole feedback for reinforcement learning problems. This signal is surprisingly powerful—it can motivate agents to solve tasks without any further guidance for how to accomplish them. Nevertheless, rewards do not come for free, and are typically hand-engineered for each problem. Furthermore, rewards are often defined as a function of an agent’s state variables. These components have traditionally been tuned to the domain and include information such as the location of the agent or other objects in the world. The reward function then is inherently based on domain-specific representations. While such reward specifications can be sufficient enough to produce optimal behavior, more complex tasks might be difficult to express in this manner. Suppose a robot has a task of building origami figures. The environment would need to provide a reward each time the robot made a correct figure, thus requiring the program designer to define a notion of correctness for each desired configuration. Constructing a reward function for each model might become tedious and even difficult—what should the inputs even be?
Humans regularly exploit learning materials outside of the physical realm of a task, be it through diagrams, videos, text, and speech. For example, we might look at an image of a completed origami figure to determine if our own model is correct. This proposal will describe similar approaches for presenting tasks to agents. In particular, I will introduce methods for specifying perceptual goals both within and outside of the agent’s environment, and perceptual reward functions that are derived from these goals. This will allow us to represent goals in settings where we can more easily find or construct solutions, without requiring us to modify the reward function when the task changes. In this proposal, I aim to demonstrate that rewards derived from perceptual goal specifications are: easier to specify than task-specific rewards functions; more easily generalizable across tasks; and equally enable task completion. I will validate these claims with the following contributions:
1) Hand-Defined Perceptual Reward Functions specified through a hand-defined similarity metric that enable intra-domain and cross-domain goal specifications.
2) Semi-Supervised Perceptual Reward Functions learned in a semi-supervised manner that enable cross-domain goal specifications.
3) Unsupervised Perceptual Reward Functions learned from videos in an unsupervised manner that enable intra-domain and cross-domain goal specifications.