New Toolchain Automatically Finds Database Management System Bugs

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Atlanta, GA | Posted: August 28, 2020

Contact

Tess Malone, Communications Officer

tess.malone@cc.gatech.edu

Sidebar Content

No sidebar content submitted.

Summaries

Summary Sentence:

Georgia Tech researchers have applied fuzzing techniques to find bugs in database management systems (DBMS).

Full Summary:

No summary paragraph submitted.

Media

Apollo
(image/png)

Georgia Tech researchers have applied fuzzing techniques to find bugs in database management systems (DBMS). Their new toolchain APOLLO automatically detects, reports, and diagnoses a common DBMS bug.

APOLLO automates the generation of regression-triggering queries, simplifies the bug reporting process for users, and enables developers to quickly pinpoint the root cause of performance regressions.

The researchers discovered 10 previously unknown and unique performance regressions, reduced query size by 4.2 times, and identified branches related to the root cause.

"We believe that Apollo will assist database system developers with the tedious process of testing these complex systems," said School of Computer Science (SCS) Assistant Professor Joy Arulraj. "This will allow them to focus on more important problems in developing database systems."

DBMS problems

The complexity of DBMS increases their potential for error. An upgrade on a DBMS can unexpectedly slow down certain queries, a problem known as a performance regression bug.

“A critical regression can reduce performance by orders of magnitude, in many cases converting an interactive query to an overnight execution,” said SCS Ph.D. student Jinho Jung.

To improve this issue, the researchers used the toolchain approach, a pipeline of distinct software development tools that are linked together by specific stages.

The team’s new toolchain has three components:

SQLFuzz

SQLFuzz generates structured query language (SQL), the language databases communicate with, to find performance regressions. It works by bombarding a system with many randomly generated inputs to trigger bugs, a technique known as fuzzing.

“During the fuzzing test, we noticed that validating performance regressions is challenging because the ground truth of the regression is unclear and may be heavily affected by the

execution environment and lead to a lot of false-positive bugs,” Jung said.

To counter this, the researchers applied validation checks to reduce false positives.

SQLMin

SQLMin minimizes the regression-triggering query, so performance isn’t compromised by trying to determine the essence of a regression-causing statement. The researchers achieve this by using both bottom-up and top-down approaches.

Bottom-up strategy extracts one sub-query from the database and monitors whether there is still a regression problem. If there is one, SQLMin keeps the sub-query for further analysis. The top-down strategy removes as many expressions as possible.

“This takes out as many elements of the statement as possible while ensuring that the reduced query still triggers the problem,” Jung says.

SQLDebug

Once a regression report is filed, developers must diagnose its root cause. To simplify the diagnosis process, the researchers use two techniques to automatically identify the root cause.

First, they use the mathematical approximation method of bisecting to find the historical commit, or first code update, that the developer pushed to the code repository. Second, they leverage statistical debugging to determine if performance decreased because of suspicious source lines within the commit.

The researchers introduced Apollo at the Very Large Data Bases conference from Aug. 31 to Sept. 4. Jung wrote the paper, APOLLO: Automatic Detection and Diagnosis of Performance Regressions in Database Systems, with SCS Ph.D. postdoctoral student Hong Hu, Arulraj, and Associate Professor Taesoo Kim, and eBay’s Woonhak Kang.

Additional Information

Groups

College of Computing, School of Computer Science

Categories

No categories were selected.

Related Core Research Areas

Data Engineering and Science

Newsroom Topics

No newsroom topics were selected.

Keywords

No keywords were submitted.

Status

Created By: Tess Malone
Workflow Status: Published
Created On: Aug 28, 2020 - 1:07pm
Last Updated: Aug 28, 2020 - 1:18pm