*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Encoding 3D Contextual Information For Dynamic Scene Understanding
Steven Hickson
Ph.D. Student in Computer Science
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Friday, December 13, 2019
Time: 2:00 - 3:30pm (EST)
Location: Coda C1108 Brookhaven
Committee:
------------
Dr. Irfan Essa (Advisor), Senior Associate Dean, School of Interactive Computing, Georgia Institute of Technology
Dr. Frank Dellaert, School of Interactive Computing, Georgia Institute of Technology
Dr. Zsolt Kira, School of Interactive Computing , Georgia Institute of Technology
Dr. Judy Hoffman, School of Interactive Computing , Georgia Institute of Technology
Dr. Rahul Sukthankar, Principal Scientist/Director at Google AI Perception / Robotics Institute, Carnegie Mellon University
Abstract:
-----------
Humans have an inherent understanding of the shape of their environment and the objects contained in it. Given a description of a room, a person can understand a reasonable approximation of the space and the objects. However, our current methods lack this type of contextual understanding (i.e. a chair is shaped a particular way and indicates you can sit on it). This work is motivated by the idea that there is an inherent relationship between 3D information such as shape and scene understanding/object classification. Objects such as tables, chairs, and cups have a specific shape and our models should leverage and learn that information. Depth and surface normals have frequently been used as additional signals in semantic labeling work; however, there is still limited understanding on using and learning shape and labels jointly. Our work examines using 3D cues for unsupervised and supervised approaches for segmentation and semantic labeling. We show how to use 3D information for robust unsupervised segmentation, supervised semantic labeling using segmentation, and unsupervised object categorization. We explore this relationship further by showing how shape helps deep neural networks semantically label indoor environments. We explore how joint estimation of shape and labels improves both results when learned together and how they can both be done with little added model capacity.
This proposal aims to demonstrate how 3D cues may be used to improve semantic labeling and object classification. Specifically, we will consider depth, surface normals, object classification, and pixel-wise semantic labeling in this work. The works outlined aim to validate the following thesis statement: Shape is used as an additional context that improves segmentation, unsupervised clustering, object classification and semantic labeling with little computational overhead.
The proposed work will show:
Combining shape and object labels improves classification with (1) requiring few extra parameters, (2) with surface normals being a closer shared-task to labeling than depth, and (3) combining shape with labels improves accuracy for each task. We describe various methods to combine shape and object classification and then discuss our extensions of the proposed work which focus on surface normal prediction and semantic labeling specifically.