*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Candidate:
Kumar Ashis Pati
Ph.D. in Music Technology
School of Music
Georgia Institute of Technology
Title:
Learning to Manipulate Latent Representations of Deep Generative Models
-- towards improving interactivity and controllability in automatic music creation
Date: 9th December 2020
Time: 12:00 to 14:00 (EDT)
Location: https://bluejeans.com/1783153167
Note: This defense is remote-only
Abstract:
Deep generative models have emerged as a tool of choice for the design of automatic music composition systems. While these models are capable of learning complex representations from data, a limitation of many of these models is that they allow little to no control over the generated music. Latent representation-based models, such as Variational Auto-Encoders, have the potential to alleviate this limitation as they are able to encode hidden attributes of the data in a low-dimensional latent space. However, the encoded attributes are often not interpretable and cannot be explicitly controlled.
The work presented in this thesis seeks to address these challenges by learning to manipulate and design latent spaces in a way that allows control over musically meaningful attributes that are understandable by humans. This in turn can allow explicit control of such attributes during the generation process and help users realize their compositional goals. Specifically, three different approaches are proposed to investigate this problem. The first approach shows that we can learn to traverse latent spaces of generative models to perform complex interactive music composition tasks. The second approach uses a novel latent space regularization technique which can encode individual musical attributes along specific dimensions of the latent space. The third approach attempts to use attribute-informed non-linear transformations over an existing latent space such that the transformed latent space allows controllable generation of data. In addition, the problem of disentanglement learning in the context of symbolic music is investigated systematically by proposing a tailor-made dataset for the task and evaluating the performance of several different methods for unsupervised and supervised disentanglement learning. Together, the proposed methods will help address critical shortcomings of deep music generative models and pave the path towards intuitive interfaces which can be used by humans in real compositional settings.
Committee:
Prof. Alexander Lerch (Advisor, School of Music, Georgia Institute of Technology)
Prof. Devi Parikh (School of Interactive Computing, Georgia Institute of Technology)
Prof. Gil Weinberg (School of Music, Georgia Institute of Technology)
Prof. Jason Freeman (School of Music, Georgia Institute of Technology)
Prof. Mark Riedl (School of Interactive Computing, Georgia Institute of Technology)
Rudolph van der Merwe (Senior Engineering Manager, Advanced Computations Group, Apple Inc.)
--------------------------------------------------------------------