*********************************
There is now a CONTENT FREEZE for Mercury while we switch to a new platform. It began on Friday, March 10 at 6pm and will end on Wednesday, March 15 at noon. No new content can be created during this time, but all material in the system as of the beginning of the freeze will be migrated to the new platform, including users and groups. Functionally the new site is identical to the old one. webteam@gatech.edu
*********************************
Title: Controllable Content Based Image Retrieval and Synthesis
Patsorn Sangkloy
Ph.D. Student in Computer Science
School of Interactive Computing
Georgia Institute of Technology
Date: February 4, 2022
Time: 10:00 AM to 12:00 PM (EST)
Location (remote via Bluejeans): https://bluejeans.com/508230685
Committee
Dr. James Hays (Advisor) - School of Interactive Computing, Georgia Institute of Technology
Dr. Devi Parikh - School of Interactive Computing, Georgia Institute of Technology
Dr. Diyi Yang - School of Interactive Computing, Georgia Institute of Technology
Dr. Mark Riedl - School of Interactive Computing, Georgia Institute of Technology
Dr. Subhransu Maji - College of Information and Computer Sciences, University of Massachusetts, Amherst
Abstract
A commonly used means to retrieve desired images is by using a text query. This natural form of querying comes with at least two drawbacks. Firstly, the retrieval system may be language specific, limiting its use to only users speaking supported languages. Secondly, there are certain types of target images that would require a lengthy text query to guarantee a successful retrieval. A representative example is an image containing multiple objects at precise locations. The latter drawback is the primary problem we address in this thesis.
In this thesis, we investigate the use of hand-drawn sketches as a form of query to fetch desired images. Two related but subtly different tasks are studied:
1. Content Based Image Retrieval (where target images are retrieved from a database),
2. Content Based Image Synthesis (where target images are generated).
We consider two modes of querying:
1. Visual content (where a query can be expressed as a simple line drawing sketch, an image patch, or a color scribble),
2. Language content (where a query can be expressed as a textual description of desired target images).
For sketch-based image retrieval, we propose a cross-domain network that embeds a user query (sketch) and a target image into a shared feature space, facilitating ready similarity scoring. We collected Sketchy Database; a large-scale dataset of matching sketch and image pairs that can be used as training data. The dataset has been made publicly available and has become one of the few standard benchmarks for sketch-based image retrieval.
To incorporate both sketch and language content as queries, we propose a late-fusion dual-encoder approach. Our method generalizes CLIP -- a recent successful work on vision and language representation learning -- to sketch-based input. We also collected 5,000 hand-drawn sketches, which can be combined with existing annotated captions in the COCO data to evaluate image retrieval with both text and sketch queries.
For image synthesis, we present a general framework that allows users to interactively control the generated images based on specification of visual features (e.g., shape, color, texture, sketch). For both retrieval and synthesis tasks, our findings reveal that using a sketch as part of the input makes it easier to succinctly describe desired images.