*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Improving Robotic Manipulation with Multi-Modal Scene Understanding
Committee:
Dr. Vela, Advisor
Dr. Yezzi, Chair
Dr. AlRegib
Abstract: The objective of the proposed research is to improve robotic manipulation with multi-modal scene understanding, which will enhance the capability of assistive robots in daily life. Modern methods for robotic grasp detection via vision-based scene understanding usually perform direct regression from visual information to final grasp representation, which lacks enough supervisions on capturing low-level features and can lead to performance drop from perception to execution. Additionally, data-driven methods achieve state-of-the-art detection accuracy but fall short to perform in real-time applications. Beyond reasoning how to grasp, general manipulation requires robots to interpret affordance, which is a subset of object attribute and reveals potential interactions between object parts. Recent studies on affordance detection address the problem in pixel level. Although obtaining where to perform affordance-related actions via post-processing, segmentation- based methods can't recover other execution-related information like how to perform actions. In this proposal, we incorporate the concept of keypoint in both grasp detection and affordance detection. In the first work, we formulate robotic grasp detection as grasp keypoint detection. Keypoint-based grasp representation captures additional geometric information, and its simplicity improves the trade-off between detection accuracy and inference speed. In the second work, we augment affordance segment to a set of five keypoints, which help recover full execution information for robotic manipulation. In the proposed work, we plan to explore multi-modal scene understanding for reasoning what tasks to perform, which will improve the capability of robots in daily life.