*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Speaker: Ryan Curtin (CSIP)
Title: Dual-tree k-means with bounded single-iteration runtime
Abstract:
k-means is a widely used clustering algorithm, but for k clusters and a dataset size of N, each iteration of Lloyd's algorithm costs O(kN) time. Although there are tree-based techniques to accelerate single Lloyd iterations, none of these techniques are tailored to the case of large k, which is increasingly common as dataset sizes grow. We propose a dual-tree algorithm that gives the exact result of Lloyd iterations; when using cover trees, we are able to bound the worst-case runtime of each algorithm as O(N + k log k) (this bound also depends on dataset-dependent quantities). To our knowledge these are the first sub-O(kN) bounds for exact Lloyd iterations. We then show that these theoretically favorable algorithms significantly outperform other approaches in practice, especially for large N and k.
Speaker Bio:
Ryan Curtin is a nearly-finished Ph.D. student in the School of Electrical and Computer Engineering at Georgia Tech; his research focuses on the acceleration of core primitives that make up machine learning algorithms, generally via the use of trees and other hierarchical structures. He is also the primary developer and maintainer of the mlpack machine learning library (http://www.mlpack.org/). He doesn't enjoy writing biographies, and also is a pinball wizard.