PhD Defense by Zaiwei Chen

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Thursday April 7, 2022
  1:00 pm - 3:00 pm
Location: Groseclose 402
Phone:
URL: ZOOM
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

Full Summary: No summary paragraph submitted.

Title: A Unified Lyapunov Framework for Finite-Sample Analysis of Reinforcement Learning Algorithms

Date: 04/07/2022
Time: 1:00 - 2:30 pm EST
Location: Groseclose 402, or virtually at https://gatech.zoom.us/j/9849731860?pwd=K29BSStGekgvYkxlK1ZRZVp1QUlLdz09 (Meeting ID: 984 973 1860 Passcode: 7n46MA).
Student Name: Zaiwei Chen
Machine Learning PhD Student
School of Industrial & Systems Engineering
Georgia Institute of Technology

Committee
1 Dr. Siva Theja Maguluri (Advisor)
2 Dr. John-Paul Clarke (Co-advisor)
3 Dr. Justin Romberg
4 Dr. Ashwin Pananjady
5 Dr. Benjamin Van Roy

Abstract: In this thesis, we develop a unified Lyapunov approach for establishing finite-sample guarantees of reinforcement learning (RL) algorithms. Since most of the RL algorithms can be modeled by stochastic approximation (SA) algorithms under Markovian noise, we first provide a Lyapunov framework for analyzing Markovian SA algorithms. The key idea is to construct a novel Lyapunov function (called generalized Moreau envelop) to capture the dynamics of the corresponding SA algorithm, and establish a negative drift inequality, which then can be repeatedly used to derive finite-sample bounds. We use our SA results to design RL algorithms and perform finite-sample analysis. Specifically, for tabular RL, we establish finite-sample bounds for Q-learning, variants of on-policy TD-learning algorithms such as n-step TD and TD(\lambda), and off-policy TD-learning algorithms such as Retrace(\lambda), Q^\pi(\lambda), and V-trace, etc. As by-products, we provide theoretical insight into the problem of efficiency of bootstrapping in on-policy TD-learning, and demonstrate the bias-variance trade-off in off-policy TD. For RL with linear function approximation, we design convergent variants of Q-learning and TD-learning in the presence of the deadly triad, and derive finite-sample guarantees. The TD-learning algorithm was later used in a general policy-based framework (including approximate policy iteration and natural policy gradient) to eventually find an optimal policy of the RL algorithm with an O(\epsilon^{-2}) sample complexity.

Additional Information

In Campus Calendar

Groups

Graduate Studies

Invited Audience

Faculty/Staff, Public, Undergraduate students

Categories

Other/Miscellaneous

Keywords

Phd Defense

Status

Created By: Tatianna Richardson
Workflow Status: Published
Created On: Apr 4, 2022 - 9:56am
Last Updated: Apr 4, 2022 - 9:56am

Georgia Tech

PhD Defense by Zaiwei Chen

Additional Information