Telling the What while Pointing to the Where: Multimodal Queries for
Image Retrieval

Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval

9 February 2021

Soravit Changpinyo

Jordi Pont-Tuset

Papers citing "Telling the What while Pointing to the Where: Multimodal Queries for Image Retrieval"

14 / 14 papers shown

Title
Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions Prajwal Gatti Kshitij Parikh Dhriti Prasanna Paul Manish Gupta Anand Mishra 118 2 0 12 Feb 2025
CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model Dongyoung Go Taesun Whang Chanhee Lee Hwayeon Kim Sunghoon Park Seunghwan Ji Dongchan Kim Young-Bum Kim Young-Bum Kim LRM 216 1 0 19 Nov 2024
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers Cong Wei Yang Chen Haonan Chen Hexiang Hu Ge Zhang Jie Fu Alan Ritter Wenhu Chen 47 53 0 28 Nov 2023
EDIS: Entity-Driven Image Search over Multimodal Web Content Siqi Liu Weixi Feng Tsu-jui Fu Wenhu Chen Luu Anh Tuan VLM 48 9 0 23 May 2023
Who are you referring to? Coreference resolution in image narrations A. Goel Basura Fernando Frank Keller Hakan Bilen 25 3 0 26 Nov 2022
A Sketch Is Worth a Thousand Words: Image Retrieval with Text and Sketch Patsorn Sangkloy Wittawat Jitkrittum Diyi Yang James Hays 3DV 29 32 0 05 Aug 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering A. Piergiovanni Wei Li Weicheng Kuo M. Saffar Fred Bertsch A. Angelova 17 16 0 02 May 2022
UIGR: Unified Interactive Garment Retrieval Xiaoping Han Sen He Li Zhang Yi-Zhe Song Tao Xiang 19 7 0 06 Apr 2022
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts Soravit Changpinyo P. Sharma Nan Ding Radu Soricut VLM 299 1,084 0 17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision Chao Jia Yinfei Yang Ye Xia Yi-Ting Chen Zarana Parekh Hieu H. Pham Quoc V. Le Yun-hsuan Sung Zhen Li Tom Duerig VLM CLIP 337 3,708 0 11 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers Lisa Anne Hendricks John F. J. Mellor R. Schneider Jean-Baptiste Alayrac Aida Nematzadeh 79 110 0 31 Jan 2021
Similarity Reasoning and Filtration for Image-Text Matching Haiwen Diao Ying Zhang Lingyun Ma Huchuan Lu 231 332 0 05 Jan 2021
Adaptive Offline Quintuplet Loss for Image-Text Matching Tianlang Chen Jiajun Deng Jiebo Luo 181 68 0 07 Mar 2020
Dialog-based Interactive Image Retrieval Xiaoxiao Guo Hui Wu Yu Cheng Steven J. Rennie Gerald Tesauro Rogerio Feris 55 204 0 01 May 2018