ARC Colloquium: Ashwin Pananjady (Georgia Tech)

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Monday April 26, 2021
  11:00 am - 12:00 pm
Location: Virtual via Bluejeans
Phone:
URL:
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: Toward instance-optimal policy evaluation: From lower bounds to algorithms - Virtual via Bluejeans at 11:00am

Full Summary: No summary paragraph submitted.

Algorithms & Randomness Center (ARC)

Ashwin Pananjady (Georgia Tech)

Monday, April 26, 2021

Virtual via Bluejeans - 11:00 am

Title: Toward instance-optimal policy evaluation: From lower bounds to algorithms

Abstract: The paradigm of reinforcement learning has now made inroads in a wide range of applied problem domains. This empirical research has revealed the limitations of our theoretical understanding: popular RL algorithms exhibit a variety of behavior across domains and problem instances, and existing theoretical bounds, which are generally based on worst-case assumptions, can often produce pessimistic predictions. An important theoretical goal is to develop instance-specific analyses that help to reveal what aspects of a given problem make it "easy" or "hard", and allow distinctions to be drawn between ostensibly similar algorithms. Taking an approach grounded in nonparametric statistics, we initiate a study of this question for the policy evaluation problem. We show via information-theoretic lower bounds that many popular variants of temporal difference (TD) learning algorithms *do not* exhibit the optimal instance-specific performance in the finite-sample regime. On the other hand, making careful modifications to these algorithms does result in automatic adaptation to the intrinsic difficulty of the problem. When there is function approximation involved, our bounds also characterize the instance-optimal tradeoff between "bias" and "variance" in solving projected fixed-point equations, a general class of problems that includes policy evaluation as a special case.

The talk is based on joint work with Koulik Khamaru, Wenlong Mou, Feng Ruan, Martin Wainwright, and Michael Jordan. No prior knowledge of reinforcement learning will be assumed.

----------------------------------

Speaker's Webpage

Videos of recent talks are available at: http://arc.gatech.edu/node/121

Click here to subscribe to the seminar email list: arc-colloq@Klauscc.gatech.edu

Additional Information

In Campus Calendar

Groups

ARC

Invited Audience

Faculty/Staff, Postdoc, Graduate students, Undergraduate students

Categories

Seminar/Lecture/Colloquium

Keywords

No keywords were submitted.

Status

Created By: Francella Tonge
Workflow Status: Published
Created On: Mar 17, 2021 - 10:17am
Last Updated: Apr 21, 2021 - 8:00am