GVU Center Brown Bag: Visual Dialog: Towards AI Agents That Can See, Talk, and Act

*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************

Event Details

Date/Time:
- Thursday August 31, 2017 - Friday September 1, 2017
  11:30 am - 12:59 pm
Location: Technology Square Research Building, 1st Floor Ballroom, Atlanta, Ga
Phone:
URL: http://www.tsrb.gatech.edu/
Email:
Fee(s):
N/A
Extras:
Free food

Contact

gvu@cc.gatech.edu

Summaries

Summary Sentence: What lies next for AI? Will the next generation of intelligent systems have the ability to aid visually impaired users?

Full Summary: This seminar will present a range of projects from my lab (some in collaboration with Prof. Devi Parikh) towards building such visually grounded conversational agents.

Media

Dhruv Batra
(image/jpeg)

Abstract:

We are witnessing unprecedented advances in computer vision and artificial intelligence (AI). What lies next for AI? We believe that the next generation of intelligent systems (say the next generation of Google's Assistant, Facebook's M, Apple’s Siri, Amazon’s Alexa) will need to posses the ability to `perceive' their environment (through vision, audition, or other sensors), `communicate’ (i.e., hold a natural language dialog with humans and other agents), and `act’ (e.g., aid humans by executing API calls or commands in a virtual or embodied environment), for tasks such as:

Aiding visually impaired users in understanding their surroundings or social media content (AI: ‘John just uploaded a picture from his vacation in Hawaii’, Human: ‘Great, is he at the beach?’, AI: ‘No, on a mountain’)
Aiding analysts in making decisions based on large quantities of surveillance data (Human: ‘Did anyone enter this room last week?’, AI: ‘Yes, 27 instances logged on camera’, Human: ‘Were any of them carrying a black bag?’),
Interacting with an AI assistant (Human: ‘Alexa – can you see the baby in the baby monitor?’, AI: ‘Yes, I can’, Human: ‘Is he sleeping or playing?’).
Robotics applications (e.g. search and rescue missions) where the operator may be ‘situationally blind’ and operating via language (Human: ‘Is there smoke in any room around you?’, AI: ‘Yes, in one room’, Human: ‘Go there and look for people’).

In this talk, I will present a range of projects from my lab (some in collaboration with Prof. Devi Parikh’s lab) towards building such visually grounded conversational agents.

Speaker Bio:

Dhruv Batra is an Assistant Professor in the School of Interactive Computing at Georgia Tech and a Research Scientist at Facebook AI Research (FAIR). His research interests lie at the intersection of machine learning, computer vision, and artificial intelligence, with a focus on developing intelligent systems that are able to concisely summarize their beliefs about the world with diverse predictions, integrate information and beliefs across different sub-components or `modules' of AI (vision, language, reasoning, dialog), and interpretable AI systems that provide explanations and justifications for why they believe what they believe. In past, he has also worked on topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation, and distributed optimization for inference and learning in probabilistic graphical models.

He is a recipient of the Office of Naval Research (ONR) Young Investigator Program (YIP) award (2016), the National Science Foundation (NSF) CAREER award (2014), Army Research Office (ARO) Young Investigator Program (YIP) award (2014), Virginia Tech College of Engineering Outstanding New Assistant Professor award (2015), two Google Faculty Research Awards (2013, 2015), Amazon Academic Research award (2016), Carnegie Mellon Dean's Fellowship (2007), and several teaching commendations at Virginia Tech. His research is supported by NSF, ARO, ARL, ONR, DARPA, Amazon, Google, Microsoft, and NVIDIA. Research from his lab has been featured in Bloomberg Business, The Boston Globe, MIT Technology Review, Newsweek, WVTF Radio IQ, and a number of popular press magazines and newspapers. From 2013-2016, he was an Assistant Professor in the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he led the VT Machine Learning & Perception group and was a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytics Center (DAC). From 2010-2012, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located on the University of Chicago campus.

He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, CSAIL MIT, Microsoft Research, and Facebook AI Research.

Additional Information

In Campus Calendar

Yes

Groups

GVU Center, IPaT

Invited Audience

Faculty/Staff, Public, Graduate students, Undergraduate students

Categories

Seminar/Lecture/Colloquium

Keywords

No keywords were submitted.

Status

Created By: Dorie Taylor
Workflow Status: Published
Created On: Aug 24, 2017 - 5:40pm
Last Updated: Aug 29, 2017 - 10:45am