(POSTPONED) ISyE/The Center for Machine Learning at Georgia Tech (ML@GT) seminar - Warren Powell

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
Contact
No contact information submitted.
Summaries

Summary Sentence: "From Reinforcement Learning to Stochastic Optimization: A Universal Framework for Sequential Decision Analytics”

Full Summary: Sequential decisions are an almost universal problem class, spanning dynamic resource allocation problems, control problems, discrete graph problems, active learning problems, as well as two-agent games and multiagent problems.  Application settings span engineering, the sciences, transportation, health services, medical decision making, energy, e-commerce and finance.  A rich problem class involves systems that must actively learn about the environment, possibly via drones or robots.  In multi-agent systems, we may need to learn about the behavior of other agents.  These problems have been addressed in the academic literature using a variety of modeling and algorithmic frameworks, including dynamic programming, stochastic programming, stochastic control, simulation optimization, approximate dynamic programming/reinforcement learning, and even multiarmed bandit problems. I will describe a universal modeling framework that can be used for any sequential decision problem in the presence of different sources of uncertainty.  The framework is centered on an optimization problem that optimizes over policies (rules for making decisions), where we show that there are two fundamental strategies for designing policies (policy search and policies based on lookahead approximations), each of which further divide into two classes, creating four (meta)classes of policies that are the foundation of any solution approach that has ever been proposed for a sequential problem.  I will demonstrate these policies in two broad contexts: pure learning problems (“bandit problems”) and dynamic resource allocation problems, where I will use a simple energy storage problem to show that each of the four classes (and a hybrid) can be made to work best.

Sequential decisions are an almost universal problem class, spanning dynamic resource allocation problems, control problems, discrete graph problems, active learning problems, as well as two-agent games and multiagent problems.  Application settings span engineering, the sciences, transportation, health services, medical decision making, energy, e-commerce and finance.  A rich problem class involves systems that must actively learn about the environment, possibly via drones or robots.  In multi-agent systems, we may need to learn about the behavior of other agents. 

These problems have been addressed in the academic literature using a variety of modeling and algorithmic frameworks, including dynamic programming, stochastic programming, stochastic control, simulation optimization, approximate dynamic programming/reinforcement learning, and even multiarmed bandit problems.

I will describe a universal modeling framework that can be used for any sequential decision problem in the presence of different sources of uncertainty.  The framework is centered on an optimization problem that optimizes over policies (rules for making decisions), where we show that there are two fundamental strategies for designing policies (policy search and policies based on lookahead approximations), each of which further divide into two classes, creating four (meta)classes of policies that are the foundation of any solution approach that has ever been proposed for a sequential problem.  I will demonstrate these policies in two broad contexts: pure learning problems (“bandit problems”) and dynamic resource allocation problems, where I will use a simple energy storage problem to show that each of the four classes (and a hybrid) can be made to work best.

Additional Information

In Campus Calendar
Yes
Groups

School of Industrial and Systems Engineering (ISYE)

Invited Audience
Faculty/Staff, Postdoc, Public, Graduate students, Undergraduate students
Categories
Seminar/Lecture/Colloquium
Keywords
No keywords were submitted.
Status
  • Created By: sbryantturner3
  • Workflow Status: Published
  • Created On: Mar 4, 2020 - 9:48am
  • Last Updated: Mar 12, 2020 - 9:37pm