Ph.D. Proposal Oral Exam - Chih-Yao Ma

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Monday April 8, 2019 - Tuesday April 9, 2019
  3:00 pm - 4:59 pm
Location: Room 5126, Centergy
Phone:
URL:
Email:
Fee(s):
N/A
Extras:

Contact

No contact information submitted.

Summaries

Summary Sentence: Toward Grounded Spatio-Temporal Reasoning

Full Summary: No summary paragraph submitted.

Title: Toward Grounded Spatio-Temporal Reasoning

Committee:

Dr. Al-Regib, Advisor

Dr. Kira, Chair

Dr. Vela

Abstract:

The objective of the proposed research is to leverage spatial, temporal, and language inputs for achieving weakly-grounded visual and textual reasoning. Recent years, there has been a lot of attention to research at the intersection of vision, temporal reasoning, and language. One of the major challenges is how to ensure proper grounding and perform reasoning across multiple modalities given the heterogeneity resides in the data when there is no or weak supervision of the data. For example, (1) in Vision-and-Language Navigation, how to ensure the navigation agent to identify which part of the instruction has been completed or ongoing, and which part is potentially needed for the next action selection. (2) in visual understanding, how to efficiently leverage object-level features for downstream tasks like action recognition and visual captioning, how to detect interactions/relationships when there is no or weak supervision from classification labels or ground-truth image/video descriptions. In my thesis, the goal is to leverage spatial, temporal, and language inputs for both visual and textual understanding. I showed (1) how to equip the concept of self-monitoring to a seq-to-seq model in order to develop a visual-textual co-grounded navigation agent that can follow human commands, (2) how to introduce the rollback concept to the navigation agent by leveraging the self-monitoring mechanism, and (3) how to efficiently achieve object-level fine-grained video understanding for both human action recognition and video captioning. In the proposed work, I will study how to enforce the visual captioning models to generate grounded descriptions without ground-truth annotations via a novel cyclical training regimen.

Additional Information

In Campus Calendar

Groups

ECE Ph.D. Proposal Oral Exams

Invited Audience

Public

Categories

Other/Miscellaneous

Keywords

Phd proposal, graduate students

Status

Created By: Daniela Staiculescu
Workflow Status: Published
Created On: Apr 2, 2019 - 6:22pm
Last Updated: Apr 2, 2019 - 6:22pm

Georgia Tech

Ph.D. Proposal Oral Exam - Chih-Yao Ma

Additional Information