*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title:
Kernel Nonparametric Tests of Homogeneity, Independence and Multi-Variable Interaction
Abstract:
We consider three nonparametric hypothesis testing problems: (1) Given samples from distributions p and q, a homogeneity test determines whether to accept or reject p=q; (2) Given a joint distribution pixy over random variables x and y, an independence test investigates whether pixy = p_x p_y, (3) Given a joint distribution over several variables, we may test for whether there exist factorization (e.g., P_xyz = P_xyP_z, or for the case of total independence, P_xyz=P_xP_yP_z). The final test (3) is of particular interest in fitting directed graphical models, as it may be used in detecting cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence, even when these variables have high dimension.
We present nonparametric tests for the three cases described, based on distances between embeddings of probability measures to reproducing kernel Hilbert spaces (RKHS), which constitute the test statistics (e.g. for independence, the distance is between the embedding of the joint, and that of the product of the marginals). The tests benefit from decades of machine research on kernels for various domains, and thus apply to distributions on high dimensional vectors, images, strings, graphs, groups, and semigroups, among others. The energy distance and distance covariance statistics are particular instances of these RKHS statistics. Finally, the tests can be applied for time series data, using a wild bootstrap procedure to approximate the null hypothesis.
Bio
Arthur Gretton is a Reader (Associate Professor) with the Gatsby Computational Neuroscience Unit, CSML, UCL, which he joined in 2010. He received degrees in physics and systems engineering from the Australian National University, and a PhD with Microsoft Research and the Signal Processing and Communications Laboratory at the University of Cambridge. He worked from 2002-2012 at the MPI for Biological Cybernetics, and from 2009-2010 at the Machine Learning Department, Carnegie Mellon University. Arthur's research interests include machine learning, kernel methods, statistical learning theory, nonparametric hypothesis testing, blind source separation, Gaussian processes, and non-parametric techniques for neural data analysis. He has been an associate editor at IEEE Transactions on Pattern Analysis and Machine Intelligence from 2009 to 2013, an Action Editor for JMLR since April 2013, a member of the NIPS Program Committee in 2008 and 2009, an Area Chair for ICML in 2011 and 2012, and a member of the COLT Program Committee in 2013.