PhD Defense by Srinivas Eswar

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Friday July 1, 2022
  2:00 pm - 4:00 pm
Location: Coda C1215 Midtown
Phone:
URL: Zoom
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: Scalable Data Mining via Constrained Low Rank Approximation

Full Summary: No summary paragraph submitted.

Title: Scalable Data Mining via Constrained Low Rank Approximation

Date: Friday, July 1st, 2022

Time: 2pm - 4pm ET

Physical Location: Coda C1215 Midtown

Virtual Location: https://gatech.zoom.us/j/92347767822

Srinivas Eswar

School of Computational Science and Engineering

Georgia Institute of Technology

Committee:

Dr. Richard Vuduc (Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Haesun Park (Co-Advisor, School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Ümit V. Çatalyürek (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Edmond Chow (School of Computational Science and Engineering, Georgia Institute of Technology)

Dr. Grey Ballard (Department of Computer Science, Wake Forest University)

------------------------

Abstract:

Matrix and tensor approximation methods are recognised as foundational tools for modern data analytics. Their strength lies in their long history of rigourous and principled theoretical foundations, judicious formulations via various constraints, along with the availability of fast computer programs. Multiple constrained low rank approximation (CLRA) formulations exist for various commonly encountered tasks like clustering, dimensionality reduction, anomaly detection, amongst others. The primary challenge in modern data analytics is the sheer volume of data to be analysed, often requiring multiple machines to just hold the dataset in memory. This dissertation presents CLRA as a key enabler of scalable data mining in distributed-memory parallel machines.

Nonnegative Matrix Factorisation (NMF) is the primary CLRA method studied in this dissertation. NMF imposes nonnegativity constraints on the factor matrices and is popular for its interpretability and clustering prowess. The major bottleneck in most NMF algorithms is a distributed matrix-multiplication kernel. We develop the PLANC software package which includes efficient matrix-multiplication and matricised tensor times Khatri-Rao product kernels tailored to the CLRA case. It employs carefully designed parallel algorithms and data distributions to avoid unnecessary computation and communication. With the development of these key kernels, we can extend PLANC to a variety of cases including handling symmetry constraints, second-order methods, and multiple data modalities. We demonstrate the effectiveness of PLANC via scaling studies on the supercomputers at the Oak Ridge Leadership Computing Facility.

Additional Information

In Campus Calendar

Groups

Graduate Studies

Invited Audience

Faculty/Staff, Public, Undergraduate students

Categories

Other/Miscellaneous

Keywords

Phd Defense

Status

Created By: Tatianna Richardson
Workflow Status: Published
Created On: Jul 6, 2022 - 2:44pm
Last Updated: Jul 6, 2022 - 2:44pm

Georgia Tech

PhD Defense by Srinivas Eswar

Additional Information