*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Emulation and Imitation via Perceptual Goal
Specifications
Ashley D. Edwards
Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Monday, January 14th, 2019
Time: 12:30 PM to 2:30PM (EST)
Location: TBA, College of Computing Building
Committee:
---------------
Dr. Charles Isbell (Advisor), School of
Interactive Computing, Georgia Institute of Technology
Dr. Tucker Balch, School of Interactive
Computing, Georgia Institute of Technology
Dr. Sonia Chernova, School of Interactive
Computing, Georgia Institute of Technology
Dr. Mark Riedl, School of Interactive
Computing, Georgia Institute of Technology
Dr. Pieter Abbeel, Department of Electrical
Engineering and Computer Sciences, University of California, Berkeley
Summary:
---------------
Much of the power behind reinforcement
learning is that we can use a single signal, known as the reward, to indicate
desired behavior. However, defining these rewards can often be difficult. This
dissertation introduces an alternative to the typical reward design mechanism.
In particular, we introduce four methods that allow one to focus on specifying
perceptual goals, rather than scalar rewards. By removing domain-specific
aspects of the problem, we demonstrate that goals can be expressed while being
agnostic to the reward function, action-space, or state-space of the agent’s
environment.
First, we will introduce perceptual reward
functions and describe how we can utilize a hand-defined similarity metric to
enable learning from goals that are different from the agent’s. We show how we
can use this method to train a simulated robot to learn from videos of humans.
Next, we will introduce cross-domain
perceptual reward functions and describe how we can learn a reward function for
cross-domain goal specifications. We show how we can use this method to train
an agent in a maze to reach goals specified through speech and hand gestures.
Next, we will introduce perceptual value
functions and describe how we can learn a value function from sequences of
expert observations without access to ground-truth actions. We show how we can
use this method to infer values from observation for a maze and pouring task,
and to train an agent to solve unseen levels within a platform game.
Finally, we will introduce latent policy
networks and describe how we can learn a policy from sequences of expert
observations without access to ground-truth actions. We show how we can use
this method to infer a policy from observation and train an agent to solve
classic control tasks and a platform game.