Accelerating Advanced Analytics - Arun Kumar

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Tuesday February 2, 2016 - Wednesday February 3, 2016
  10:00 am - 10:59 am
Location: Klaus 1116 East and West
Phone:
URL:
Email:
Fee(s):
0.00
Extras:

Contact

Susie McClain

smcclain@cc.gatech.edu

Summaries

Summary Sentence: Accelerating Advanced Analytics - Arun Kumar

Full Summary: No summary paragraph submitted.

Media

Arun Kumar
(image/jpeg)

Title: Accelerating Advanced Analytics

Abstract:

Advanced analytics -- the analysis of large and complex data with machine

learning (ML) -- is becoming ubiquitous, with a growing demand for

advanced analytics tools in the enterprise domains. However, there exist

several challenging bottlenecks in the end-to-end process of building and

deploying advanced analytics applications. My research focuses on

abstractions, algorithms, and systems to mitigate such bottlenecks and

accelerate advanced analytics from a data management standpoint.

In this talk, I will focus on my work on mitigating one such pervasive

bottleneck in the process of feature engineering for ML -- joins of

multiple tables. Many real-world datasets are multi-table, connected by

key-foreign key relationships, but almost all ML toolkits expect

single-table inputs. This forces data scientists to join all tables and

materialize a single table that collects all features. Alas, such joins

often cause the output to blow up in size, which slows down ML, increases

costs, and leads to data maintenance headaches. In my work, I show how it

is possible to mitigate these issues by "avoiding joins physically,"

i.e., pushing ML down through joins. This reduces runtime without

affecting accuracy. Going further, I apply statistical learning theory to

show how it is often possible to also "avoid joins logically," i.e.,

ignore entire tables outright without losing much accuracy, but achieving

significant runtime gains.

Bio:

Arun Kumar is a Ph.D. candidate at the University of Wisconsin-Madison.

His primary research interests are in data management and its

intersection with machine learning. He is co-advised by Jeffrey Naughton

and Jignesh M. Patel, and has also worked closely with Christopher Re and

Xiaojin Zhu. Systems and ideas from his research have been shipped in

products by EMC, Oracle, Cloudera, and IBM. A paper co-authored by him

was accorded the Best Paper Award at ACM SIGMOD 2014. He was awarded the

Anthony C. Klug NCR Fellowship in database systems in 2015. He received

his M.S. from UW-Madison in 2011 and his B.Tech. from IIT Madras in 2009.

Webpage:

http://pages.cs.wisc.edu/~arun/<http://pages.cs.wisc.edu/%7Earun/>

Additional Information

In Campus Calendar

Groups

College of Computing, School of Computer Science

Invited Audience

Undergraduate students, Faculty/Staff, Public, Graduate students

Categories

Seminar/Lecture/Colloquium

Keywords

College of Computing, Georgia Tech, School of Computer Science

Status

Created By: Birney Robert
Workflow Status: Published
Created On: Jan 21, 2016 - 10:38am
Last Updated: Apr 13, 2017 - 5:16pm