*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Teaching robots to walk using deep reinforcement learning and sim-to-real transfer
Wenhao Yu
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Tuesday, October 22nd, 2019
Time: 12:00pm EDT
Location: GVU Center, TSRB
Committee:
---------------
Dr. Greg Turk (Advisor, School of Interactive Computing, Georgia Tech)
Dr. C. Karen Liu (Advisor, School of Engineering, Stanford University / School of Interactive Computing, Georgia Tech)
Dr. Charlie Kemp (Department of Biomedical Engineering / School of Interactive Computing, Georgia Tech)
Dr. Sergey Levine (Department of Electrical Engineering and Computer Sciences, University of California, Berkeley)
Dr. Michiel van de Panne (Department of Computer Science, University of British Columnbia)
Abstract:
------------
Deep reinforcement learning (DRL) has the potential to automate the process of developing controllers for complex motor skills such as locomotion. However, due to the high sample complexity and safety concerns, directly applying DRL on real robot has been in general infeasible. Computer simulation provides a safe and efficient way to train robotic controllers, but a control policy trained in computer simulation usually fails to perform the desired tasks on real hardware due to the discrepancy in simulation, also known as the Reality Gap. In this proposal, we investigate the problem of transferring a simulation trained policy to a real robot, with a focus on learning locomotion skills for biped and quadruped robots. Legged locomotion requires precise coordination of different motors on the robot to keep it walking forward while keeping balance, which makes sim-to-real transfer for legged locomotion a challenging task.
We first introduce an algorithm named Universal Policy with Online System Identification (UP-OSI), where a model is trained to identify physics parameters from the robot's observations and to guide the control policy to choose suitable actions. We demonstrate that UP-OSI can adapt to changes in dynamic parameters such as friction coefficients or body mass. However, the success of UP-OSI relies largely on the ability of OSI to obtain good estimations of the physical parameters. When the training and testing dynamics are significantly different, the performance of OSI will drop, leading to worse performance.
To overcome larger discrepancies in dynamics, we introduce a series of algorithms based on the idea of Strategy Optimization (SO), where the policy is allowed to collect additional data in the target environment and use those experiences to find the best input to the policy explicitly. This allows the policy to overcome larger reality gap and has been successfully applied to learn locomotion controllers for a biped robot, Robotis Darwin OP2, and a quadruped robot, Ghost Robotics Minitaur.
Finally, we discuss possible paths toward obtaining a more reliable and versatile locomotion policy that can control the legged robot to walk in more challenging environments such as on a road with tiles of different friction coefficients, on a path with varying slopes, or on a deformable surface like sofa. We plan to use the Darwin OP2 robot as the main testing platform. Our first step is to identify a more accurate actuator model for the robot. Then based on the result of the first step, we will choose one from three possible directions to explore, including: 1) extending UP-OSI to train locomotion controllers for unstructured environments, 2) extending SO-based methods to time-varying environments, or 3) fast fine-tuning of control policy on the hardware.