Ph.D. Defense of Dissertation: Dongryeol Lee

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details
  • Date/Time:
    • Friday May 4, 2012 - Saturday May 5, 2012
      2:00 pm - 1:59 pm
  • Location: Atlanta, GA
  • Phone:
  • URL:
  • Email:
  • Fee(s):
    N/A
  • Extras:
Contact

Dongryeol Lee

Summaries

Summary Sentence: A Distributed Kernel Summation Framework for Machine Learning

Full Summary: No summary paragraph submitted.

Ph.D. Defense of Dissertation Announcement
------------------------------------------------------------------
Dongryeol Lee
School of Computational Science and Engineering
College of Computing
Georgia Institute of Technology
dongryel@cc.gatech.edu

Title: A Distributed Kernel Summation Framework for Machine Learning

Date: Friday, May 4, 2012
Time: 10 AM - 12 PM EST
Location: KACB 1212

Committee:

  • Professor Alexander Gray (Advisor, School of Computational Science and Engineering, Georgia Tech)
  • Professor Edmond Chow (School of Computational Science and Engineering, Georgia Tech)
  • Professor Christos Faloutsos (School of Computer Science, Carnegie Mellon University, Georgia Tech)
  • Professor Haesun Park (School of Computational Science and Engineering, Georgia Tech)
  • Professor Richard Vuduc (School of Computational Science and Engineering, Georgia Tech)


Abstract:
The class of computational problems I consider in my thesis share the common trait of requiring consideration of pairs (or higher-order tuples) of data points. For problems modeling pairwise interactions, we consider accelerating the operations on N by N matrices of the form: $K = { k(x_i, xj )}_{i,j}$ where k(•, •) is the function that outputs a real value given $x_i$ and $x_j$ from the data set. I focus on the problem of kernel summation operations ubiquitous in many data mining and scientific algorithms.

In machine learning, kernel summations appear in popular kernel methods which can model
nonlinear structures in data. Kernel methods include many non-parametric methods such as
kernel density estimation, kernel regression, Gaussian process regression, kernel PCA, and
kernel support vector machines (SVM). In computational physics, the kernel summation appears as the classical N -body problem for simulating positions of a set of celestial bodies or atoms.

My thesis attempts to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general-dimension algorithms from the machine learning literature. We provide a unified, efficient parallel kernel summation framework that can utilize:

  1. Various types of deterministic and probabilistic approximations that may be suitable for both low and high-dimensional problems with a large number of data points.
  2. Indexing the data using any multi-dimensional binary tree with both distributed memory (MPI) and shared memory (OpenMP/Intel TBB) parallelism.
  3. A dynamic load balancing scheme to adjust work imbalances during the computation.

I will first summarize my previous research in serial kernel summation algorithms. This work started from Greengard/Rokhlin's earlier work on fast multipole methods for the purpose of approximating potential sums of many particles. The contributions of this part of my thesis include the followings:
(1) reinterpretation of Greengard/Rokhlin's work for the computer science community; (2) the
extension of the algorithms to use a larger class of approximation strategies, i.e. probabilistic error bounds via Monte Carlo techniques; (3) the multibody series expansion: the generalization of the theory of fast multipole methods to handle interactions of more than two entities; (4) the first $O(N)$ proof of the batch approximate kernel summation using a notion of intrinsic dimensionality.
Then I move onto the problem of parallelization of the kernel summations.

The artifact of this thesis has contributed to an open-source machine learning package called
MLPACK which has been first demonstrated at the NIPS 2008 and subsequently
at the NIPS 2011 Big Learning Workshop. Completing a portion of this thesis involved utilization of high performance computing resource at XSEDE (eXtreme Science and Engineering Discovery Environment) and NERSC (National Energy Research Scientific Computing Center).

Additional Information

In Campus Calendar
No
Groups

College of Computing, School of Computational Science and Engineering

Invited Audience
No audiences were selected.
Categories
No categories were selected.
Keywords
No keywords were submitted.
Status
  • Created By: Jupiter
  • Workflow Status: Published
  • Created On: Apr 6, 2012 - 6:37am
  • Last Updated: Oct 7, 2016 - 9:58pm