*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Learning Embodied Models of Actions from First Person Video
Yin Li
Computer Science Ph.D. Student
School of Interactive Computing
College of Computing
Georgia Institute of Technology
Date: Monday, June 20th, 2016
Time: 1:00pm to 3:00pm (EST)
LocationTSRB GVU Cafe
Committee:
---------------
Dr. James M. Rehg (Advisor), School of Interactive Computing, Georgia Institute of Technology
Dr. Irfan Essa, School of Interactive Computing, Georgia Institute of Technology
Dr. James Hays, School of Interactive Computing, Georgia Institute of Technology
Dr. Kristen Grauman, Department of Computer Science, University of Texas at Austin
Abstract:
-----------
The development of wearable cameras and the advancement of computer vision make it possible for the first time in history to collect and analyze a large scale record of our daily visual experiences, in the form of first person videos. My thesis work focuses on the automatic analysis of these first person videos, known as First Person Vision (FPV). My goal is to develop novel embodied representations for understanding the camera wearer's actions, by leveraging first person visual cues derived from first person videos, including body motion, hand locations and gaze. This ``embodied'' representation is different from traditional visual representations, as it derives from the purposive body movements of the first person and captures the concept of objects within the context of actions.
By considering actions as intentional body movements, I propose to investigate three important parts of first person actions. First, I present a method to estimate egocentric gaze that reveal the visual trajectory of an action. Our work demonstrates for the first time that egocentric gaze can be reliably estimated using only head motion and hand locations derived from first person video, and without the need of object or action information. Second, I develop a method for first person action recognition. Our work demonstrates that an embodied representation that combines egocentric cues and visual cues can inform the location of actions and significantly improve the accuracy of recognition. Finally, I propose a novel task of object interaction prediction to uncover the plan of a future object manipulation and thus explain the purposive motions. I will develop novel learning schemes for the task and learn a embodied object representation from the task.