DisCLIP: Open-Vocabulary Referring Expression Generation

DisCLIP: Open-Vocabulary Referring Expression Generation

30 May 2023

Papers citing "DisCLIP: Open-Vocabulary Referring Expression Generation"

14 / 14 papers shown

Title
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Ziqiao Ma Jing Ding Xuejun Zhang Dezhi Luo Jiahe Ding Sihan Xu Yuchen Huang Run Peng Joyce Chai 166 0 0 22 Apr 2025
Language Models Can See: Plugging Visual Controls in Text Generation Yixuan Su Tian Lan Yahui Liu Fangyu Liu Dani Yogatama Yan Wang Lingpeng Kong Nigel Collier VLM MLLM 79 97 0 05 May 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic Yoad Tewel Yoav Shalev Idan Schwartz Lior Wolf VLM 52 194 0 29 Nov 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding Aishwarya Kamath Mannat Singh Yann LeCun Gabriel Synnaeve Ishan Misra Nicolas Carion ObjD VLM 163 879 0 26 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning Jack Hessel Ari Holtzman Maxwell Forbes Ronan Le Bras Yejin Choi CLIP 117 1,545 0 18 Apr 2021
Pragmatically Informative Image Captioning with Character-Level Inference Reuben Cohn-Gordon Noah D. Goodman Christopher Potts 43 97 0 15 Apr 2018
Discriminability objective for training descriptive captions Ruotian Luo Brian L. Price Scott D. Cohen Gregory Shakhnarovich 100 203 0 12 Mar 2018
Context-aware Captions from Context-agnostic Supervision Ramakrishna Vedantam Samy Bengio Kevin Patrick Murphy Devi Parikh Gal Chechik 72 152 0 11 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions Licheng Yu Hao Tan Joey Tianyi Zhou Tamara L. Berg ObjD 85 275 0 30 Dec 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 81 1,909 0 29 Jul 2016
Reasoning About Pragmatics with Neural Listeners and Speakers Jacob Andreas Dan Klein ReLM LRM 60 174 0 02 Apr 2016
Natural Language Object Retrieval Ronghang Hu Huazhe Xu Marcus Rohrbach Jiashi Feng Kate Saenko Trevor Darrell ObjD 82 552 0 13 Nov 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 187 2,047 0 19 May 2015
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 248 4,471 0 20 Nov 2014