PhD Defense by Zaiwei Chen

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Thursday April 7, 2022
      1:00 pm - 3:00 pm
  • Location: Groseclose 402
  • Phone:
  • URL: ZOOM
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact
No contact information submitted.
Summaries

Summary Sentence: A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

Full Summary: No summary paragraph submitted.

Title: A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

Date: 04/07/2022
Time: 1:00 - 2:30 pm EST
Location: Groseclose 402, or virtually at https://gatech.zoom.us/j/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09 (Meeting ID: 984 973 1860 Passcode: 7n46MA).
Student Name: Zaiwei Chen
Machine Learning PhD Student
School of Industrial & Systems Engineering
Georgia Institute of Technology

Committee
1 Dr. Siva Theja Maguluri (Advisor)
2 Dr. John-Paul Clarke (Co-advisor)
3 Dr. Justin Romberg
4 Dr. Ashwin Pananjady
5 Dr. Benjamin Van Roy

Abstract: In this thesis, we develop a unified Lyapunov approach for establishing finite-sample guarantees of reinforcement learning (RL) algorithms. Since most of the RL algorithms can be modeled by stochastic approximation (SA) algorithms under Markovian noise, we first provide a Lyapunov framework for analyzing Markovian SA algorithms. The key idea is to construct a novel Lyapunov function (called generalized Moreau envelop) to capture the dynamics of the corresponding SA algorithm, and establish a negative drift inequality, which then can be repeatedly used to derive finite-sample bounds. We use our SA results to design RL algorithms and perform finite-sample analysis. Specifically, for tabular RL, we establish finite-sample bounds for Q-learning, variants of on-policy TD-learning algorithms such as n-step TD and TD(\lambda), and off-policy TD-learning algorithms such as Retrace(\lambda), Q^\pi(\lambda), and V-trace, etc. As by-products, we provide theoretical insight into the problem of efficiency of bootstrapping in on-policy TD-learning, and demonstrate the bias-variance trade-off in off-policy TD. For RL with linear function approximation, we design convergent variants of Q-learning and TD-learning in the presence of the deadly triad, and derive finite-sample guarantees. The TD-learning algorithm was later used in a general policy-based framework (including approximate policy iteration and natural policy gradient) to eventually find an optimal policy of the RL algorithm with an O(\epsilon^{-2}) sample complexity.

Additional Information

In Campus Calendar
No
Groups

Graduate Studies

Invited Audience
Faculty/Staff, Public, Undergraduate students
Categories
Other/Miscellaneous
Keywords
Phd Defense
Status
  • Created By: Tatianna Richardson
  • Workflow Status: Published
  • Created On: Apr 4, 2022 - 9:56am
  • Last Updated: Apr 4, 2022 - 9:56am