*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Systems Abstractions for Big Data Processing on a Single Machine
Steffen Maass
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Thursday, November 1st, 2018
Time: 4:30pm EDT
Location: KACB 3126
Committee:
------------
Dr. Taesoo Kim (Advisor, School of Computer Science, Georgia Tech) Dr. Ada Gavrilovska (School of Computer Science, Georgia Tech) Dr. Umakishore Ramachandran (School of Computer Science, Georgia Tech) Dr. Tushar Krishna (School of Electrical Engineering, Georgia Tech) Dr. Willy Zwaenepoel (Faculty of Engineering and Information Technologies, The University of Sydney)
Abstract:
-----------
Large-scale internet services, such as Facebook or Google, are using clusters of many servers for problems such as search, machine learning, and social networks.
However, while it may be possible to apply the tools used at this scale to smaller, more common problems as well, this thesis presents approaches to large-scale data processing on only a single machine.
This approach has obvious cost benefits and lowers the barrier of entrance to large-scale data processing.
This thesis approaches this problem from both an operating systems perspective as well as a redesign of applications to enable trillion-scale graph processing on a single machine.
We first present an asynchronous scheme for clearing the processors' translation lookaside buffers (TLBs) in response to the high overhead of the current, synchronous process known as a TLB shootdown.
This process is critical for system services such as freeing memory, NUMA memory migration, and page swapping in emerging, disaggregated data centers.
The key idea of this scheme, Latr, is a lazy mechanism to remove entries from the cores' TLBs while ensuring correctness by lazily releasing virtual memory only after Latr's lazy shootdown mechanism finishes.
This scheme removes the current overhead of costly inter-processor interrupts.
We show that this mechanism has impacts on many applications from webservers and key-value stores to graph processing.
Second, this thesis presents a new out-of-core graph processing engine, called Mosaic, for executing graph algorithms on trillion-scale datasets on a single machine.
Mosaic makes use of many-core processors and PCIe-SSDs coupled with a novel graph encoding scheme to allow processing of graphs of up to one trillion edges on a single machine.
Mosaic also employs a locality-preserving curve to allow for high compression and high locality when storing graphs and executing algorithms.
Third, this thesis proposes a new engine for processing evolving graphs, Kaleidoscope, based on insights about achieving high compression and locality while reducing common overheads for keeping the graph well organized.
This is an important enabling step for emerging workloads when processing graphs that change over time.