*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Improving Robustness of DNS Graph Clustering Against Noise
Yizheng Chen
Ph.D. Candidate
School of Computer Science
College of Computing
Georgia Institute of Technology
Date: Friday, October 13th, 2017
Time: 10 AM - Noon (ET)
Location: Klaus 3126
Committee:
------------------------
Dr. Emmanouil Antonakakis (Co-advisor, School of Electrical and Computer Engineering, Georgia Institute of Technology) Dr. Wenke Lee (Co-advisor, School of Computer Science, Georgia Institute of Technology) Dr. Mustaque Ahamad (School of Computer Science, Georgia Institute of Technology) Dr. Raheem Beyah (School of Electrical and Computer Engineering, Georgia Institute of Technology) Dr. Roberto Perdisci (Dept. of Computer Science, University of Georgia and School of Computer Science, Georgia Tech)
Abstract
------------------------
Clustering is often the first step performed to assist us in finding structure within unlabeled datasets. Given a small set of labels, clustering can also propagate these labels by discovering groups of objects that are similar to each other. The ever-growing amount of data being collected over a long period of time brings us opportunities and challenges for conducting clustering. Analyzing such long-term datasets allows us to solve evolving security problems, such as botnet forensic analysis, early warning of new threats, and the evolution of security phenomena. However, the analysis also faces the challenge presented by noise in the data.
This thesis improves the robustness of clustering against noise by focusing on DNS graphs. Noise is either inherent in the dataset, or can be injected by adversaries. The first goal of the thesis is to remediate the effect of the noise inherent in the data. To that end, we perform measurement studies from two different vantage points in the online advertising ecosystem. As a multi-billion dollar industry, the online ad ecosystem naturally attracts ad abuse from miscreants. We propose a new clustering technique to automatically analyze the cost of impression fraud to advertisers generated by the botnet TDSS/TDL4 over four years. In addition, our measurement results show statistically significant differences between blacklisted publishers compared to those that were never blacklisted, from the vantage point of a Demand Side Platform provider.
The second goal of the thesis is to increase the robustness of clustering against adversarial noise. Little work has been done in adversarial clustering in order to understand the weaknesses of clustering systems. We propose two novel attacks, one that injects noise to existing clusters, and one that moves data points to noisy clusters. After analyzing the effectiveness and the cost of attacks, we present defense techniques that improve the robustness of clustering in adversarial settings.