*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Ph.D. Dissertation Defense Announcement
Title: Scalable Big Data Systems: Architectures and Optimizations
Kisung Lee
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Thursday, April 30, 2015
Time: 10:00 AM - 12:00 PM EDT
Location: KACB 3402
Committee:
Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Ed Omiecinski (School of Computer Science, Georgia Institute of Technology)
Dr. Calton Pu (School of Computer Science, Georgia Institute of Technology)
Dr. Karsten Schwan (School of Computer Science, Georgia Institute of Technology)
Dr. Lakshmish Ramaswamy (Department of Computer Science, University of Georgia)
Abstract:
With continued advances in computing and information technology, digital data have grown at an astonishing rate in terms of volume, variety, and velocity. Such big data have huge potential to reveal hidden insights and promote innovation in many business, science, and engineering domains. An important technical challenge faced by many big data systems and applications is how to build efficient big data processing systems and applications that can scale to the rapid growth of digital data in the 21st century.
Dedicated to the development of architectures and optimization techniques for scaling big data processing systems, especially in the era of cloud computing, this dissertation makes three unique contributions. First, it introduces a suite of graph partitioning algorithms that can run much faster than existing data distribution methods and inherently scale to the growth of big data. The main idea of these approaches is to partition a big graph by preserving the core computational data structure as much as possible to maximize intra-server computation and minimize inter-server communication. In addition, it proposes a distributed iterative graph computation framework that effectively utilizes secondary storage to maximize access locality and speed up distributed iterative graph computations. The framework not only considerably reduces memory requirements for iterative graph algorithms but also significantly improves the performance of iterative graph computations. Last but not the least, it establishes a suite of optimization techniques for scalable spatial data processing along with three orthogonal dimensions: (i) scalable processing of spatial alarms for mobile users traveling on road networks, (ii) scalable location tagging for improving the quality of Twitter data analytics and prediction accuracy, and (iii) lightweight spatial indexing for enhancing the performance of big spatial data queries.
In this defense exam, I will briefly highlight these technical contributions and focus on presenting the distributed system for iterative graph algorithms, including system architecture, optimizations, and experimental evaluation.