*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Ph.D. Defense of Dissertation Announcement
Title: Reconciling Data Privacy and Utility in the Era of Big Data
Lei Yu
Ph.D. Student
Systems
School of Computer Science
Georgia Institute of Technology
Date: March 28 Thur
Start Time: 10:00AM
Location: KACB1202
Committee
———————
Dr. Ling Liu (Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Calton Pu (Co-Advisor, School of Computer Science, Georgia Institute of Technology)
Dr. Mustaque Ahamad (School of Computer Science, Georgia Institute of Technology)
Dr. Douglas M. Blough (School of Electrical and Computer Engineering, Georgia Institute of Technology)
Dr. Rachel Cummings (School of Industrial and Systems Engineering, Georgia Institute of Technology)
Abstract
———————
The widespread use of internet-connected mobile devices, internet of things(IoT) and cloud computing has enabled a large scale collection of personal data, including user profiles, daily activities, locations, photos and health states, etc, of millions and billions of users from a wide range of scenarios such as the usage of mobile apps, smart home, and cloud storage services. The availability of these huge amounts of datasets has been driving the breakthrough in deep learning and explosion of data-driven applications for enriching human with life-enhancing experiences. At the same time, however, these datasets often encode privacy-sensitive information related to individuals, which raises serious privacy concerns to the society. Therefore, it is imperative to develop principled privacy preserving approaches to harvesting the power of those big data. This dissertation research contributes original ideas and innovative techniques in applying differential privacy, a rigorous mathematical framework that offers provable privacy guarantee, to protect data privacy with improving the trade-off between privacy and utility in the era of big data from three perspectives respectively: data collection, data usage, and data publication.
The first contribution of this dissertation research is the development of PIVE, a two-phase dynamic differential location privacy framework that aims to protect users’ location privacy in location based services while ensuring the service quality. With the popularity of location based services for navigation, point-of-interest recommendation and social network etc, the companies that offer such services can continuously collect users’ locations. The collected location information may open doors to potential misuse and abuse of private location information, exposing users’ travel patterns and uncovering their health state and political views. PIVE provides a differentially private location perturbation mechanism that transforms the user’s exact location to a perturbed location before reporting it to the servers. This approach essentially augments differential location privacy by bounding the inference error of the adversaries with specific prior knowledge, while enabling adaptive privacy control to improve the utility and user experience.
The second contribution of this dissertation research is the development of differentially private deep learning for protecting the privacy of the training data. Because of the breakthrough of deep learning, more companies are interested in training deep neural networks on the collected data to empower their business with new competitive edges. However, a deep neural network usually has millions of model parameters, leading to large effective capacity that could be sufficient for encoding the details of individual data into model parameters. Our research addresses a collection of related topics within the context of deep learning with differential privacy. We provide more refined analysis of the privacy losses for differentially private stochastic gradient descent algorithms(SGD) for different data batching strategies. Also, we propose a family of methods for non-uniformly allocating privacy budget across SGD iterations to improve model accuracy while retaining privacy guarantees.
Last, we aim to propose differentially private data synthesization for data publication. Because the collection of individual data by governments and corporations can create tremendous opportunities for knowledge-based decision making, there is a demand for the exchange and publication of data among various parties. However, publishing data in its original form will violate individual privacy. Instead, releasing synthetic data that mimic original data provides a promising way for privacy preserving data publication while allowing rich data analytics. We propose to use deep generative models with differentially private training for data synthesization and examine the utility of synthesized data in a set of data analytics tasks.
In this defense exam, I will report the design, implementation and evaluation of PIVE and differentially private mechanisms for deep learning.