*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
TITLE: Resource Management and Scheduling for Emerging AI Applications
ABSTRACT:
A new class of artificial intelligence applications is emerging that imposes a challenging set of requirements on how we program the cloud and how we manage cloud resources efficiently. With the end of Moore’s law and Dennard scaling, coupled with simultaneous increase in the heterogeneity of increasingly interactive AI applications with end-to-end latency constraints, the future of AI systems depends on advances in resource management and scheduling for these applications. First, these applications generate an increasingly heterogeneous set of tasks, both in the resources optimal for their performance and in the time scale of individual tasks. Second, they are increasingly user-facing, imposing a set of soft real-time constraints on the frameworks serving these workloads. Third, they individually expect or benefit from heterogeneous and often conflicting resource allocation policies — a challenge for unifying frameworks that aim to support them. Thus, a set of three emergent requirements must be efficiently addressed: (1) heterogeneity awareness in space and time, (2) soft real-time end-to-end latency constraints, and (3) scheduling policy heterogeneity at the application level. To address these requirements, I will present (1) TetriSched — a mathematical framework to capture the performance as a function of resource space and timeliness requirements of these applications for cost-efficient and heterogeneity-aware resource allocation, (2) Inferline — a soft real-time system for achieving these requirements under unpredictable bursty workloads when multiple ML models are composed for inference; (3) Ray --- an active open source project that brings some of these ideas together and serves as the unifying framework for distributed ML, addressing the challenge of scheduling policy heterogeneity.
BIO:
Alexey Tumanov is a postdoctoral researcher at the University of California, Berkeley, working with Ion Stoica and collaborating closely with Joseph Gonzalez in RISELab, department of computer science. He completed his Ph.D. at Carnegie Mellon University, advised by Gregory Ganger. At Carnegie Mellon, Tumanov was awarded the prestigious Canadian government fellowship NSERC Alexander Graham Bell Canada Graduate Scholarship (NSERC CGS-D3) and was a member of the Intel Science and Technology Center for Cloud Computing and the Parallel Data Lab. Tumanov’s Systems research spanned the entire stack, starting with agile stateful VM replication with para-virtualization at the University of Toronto (working with Eyal de Lara) and most recently involving resource management for emerging AI applications. He is the recipient of the best student paper award for his thesis work on TetriSched at the EuroSys 2016 and is a Ray project committer.