*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Algorithms & Randomness Center (ARC)
Ashwin Pananjady (Georgia Tech)
Monday, April 26, 2021
Virtual via Bluejeans - 11:00 am
Title: Toward instance-optimal policy evaluation: From lower bounds to algorithms
Abstract: The paradigm of reinforcement learning has now made inroads in a wide range of applied problem domains. This empirical research has revealed the limitations of our theoretical understanding: popular RL algorithms exhibit a variety of behavior across domains and problem instances, and existing theoretical bounds, which are generally based on worst-case assumptions, can often produce pessimistic predictions. An important theoretical goal is to develop instance-specific analyses that help to reveal what aspects of a given problem make it "easy" or "hard", and allow distinctions to be drawn between ostensibly similar algorithms. Taking an approach grounded in nonparametric statistics, we initiate a study of this question for the policy evaluation problem. We show via information-theoretic lower bounds that many popular variants of temporal difference (TD) learning algorithms *do not* exhibit the optimal instance-specific performance in the finite-sample regime. On the other hand, making careful modifications to these algorithms does result in automatic adaptation to the intrinsic difficulty of the problem. When there is function approximation involved, our bounds also characterize the instance-optimal tradeoff between "bias" and "variance" in solving projected fixed-point equations, a general class of problems that includes policy evaluation as a special case.
The talk is based on joint work with Koulik Khamaru, Wenlong Mou, Feng Ruan, Martin Wainwright, and Michael Jordan. No prior knowledge of reinforcement learning will be assumed.
----------------------------------
Videos of recent talks are available at: http://arc.gatech.edu/node/121
Click here to subscribe to the seminar email list: arc-colloq@Klauscc.gatech.edu