*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Egocentric Action Understanding by Learning Embodied Attention
Date: Thursday, June 30, 2022
Time: 12:00 pm to 1:30 pm (EST)
Location: https://gatech.zoom.us/j/4156041658
Miao Liu
Robotics Ph.D. Candidate
School of Electrical and Computer Engineering
Georgia Institute of Technology
Committee:
Dr. James M. Rehg (Advisor, School of Interactive Computing, Georgia Institute of Technology)
Dr. Diyi Yang (School of Interactive Computing, Georgia Institute of Technology)
Dr. Zsolt Kira (School of Interactive Computing, Georgia Institute of Technology)
Dr. James Hays (School of Interactive Computing, Georgia Institute of Technology)
Dr. Jitendra Malik (Department of Electrical Engineering and Computer Science, University of California at Berkeley)
Abstract:
Videos captured from wearable cameras, known as egocentric videos, create a continuous record of human daily visual experience, and thereby offer a new perspective for human activity understanding. Importantly, egocentric video aligns gaze, embodied movement, and action in the same “first-person” coordinate system. The rich egocentric cues reflect the attended scene context of an action, and thereby provide novel means for reasoning human daily routines.
In my thesis work, I describe my efforts on developing novel computational models that learn the embodied egocentric attention for the automatic analysis of egocentric actions. First, I introduce a probabilistic model for learning gaze and actions in egocentric video and further demonstrate that attention can serve as a robust tool for learning motion-aware video representation. Second, I develop a novel deep model to address the challenging problem of jointly recognizing and localizing actions of a mobile user on a known 3D map from egocentric videos. Third, I present a novel deep latent variable model that makes use of human intentional body movement (motor attention) as a key representation for forecasting human-object interaction in egocentric video. Finally, I propose a novel task of future hand segmentation from egocentric videos, and show how explicitly modeling the future head motion can facilitate future hand movement forecasting.