*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Automated Reasoning for Multi-Query Optimization
Date: Tuesday, February 4, 2020
Time: 01:00 PM - 02:30 PM (EST)
Location: Klaus 1212
Qi Zhou
Ph.D. Student
School of Computer Science
Georgia Institute of Technology
Committee:
Dr. William Harris (advisor) - Galois Inc.
Dr. Joy Arulraj (co-advisor) - School of Computer Science, Georgia Institute of Technology
Dr. Shamkant B.Navathe - School of Computer Science, Georgia Institute of Technology
Dr. Alex Orso - School of Computer Science, Georgia Institute of Technology
Dr. John Regehr - School of Computing, University of Utah
Abstract:
The advent of DataBase-as-a-Service (DBaaS) platforms has increased the importance of multi-query optimization.
These services enable users to quickly create and deploy complex data processing pipelines. However, in practice, these pipelines often exhibit a significant overlap of computation due to the redundant execution of certain SQL queries. We seek to optimize the execution of a collection of queries by identifying and eliminating overlapping computations.
In this proposal, I will present two techniques for efficiently and effectively proving the equivalence of queries. I will first present a symbolic approach to tackle this problem that relies on SMT solver. While this technique covers a wider array of SQL features compared to prior algebraic approaches, it can neither support structurally-different queries nor prove equivalence under bag semantics, the underlying model of all modern database applications. I will next introduce a two-stage verification algorithm with a novel symbolic representation combined with the algebraic approach to circumvent these limitations.
In practice, even queries that are not equivalent tend to have overlapping computation. I propose to design a technique for determining containment relationships between non-equivalent queries. Furthermore, I propose to leverage this technique for augmenting a multi-query optimizer by automatically synthesizing queries that can leverage the results of prior queries.