Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.01655
Cited By
Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions
4 May 2020
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions"
35 / 35 papers shown
Title
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
Xiaozhong Liu
Peng Wang
Guoqing Wang
Yuqing Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
56
3
0
23 Sep 2024
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
30
3
0
09 Aug 2024
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
29
6
0
15 Jul 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Reza Esfandiarpoor
Cristina Menghini
Stephen H. Bach
CoGe
VLM
40
8
0
25 Mar 2024
Adversarial Testing for Visual Grounding via Image-Aware Property Reduction
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Boyu Wu
Fanjiang Xu
Qing Wang
AAML
36
0
0
02 Mar 2024
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
23
39
0
21 Jun 2023
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
35
1
0
30 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLM
OffRL
15
4
0
28 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
33
22
0
12 May 2023
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
29
125
0
13 Dec 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
19
13
0
21 Oct 2022
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-jui Fu
Yujie Lu
William Yang Wang
49
5
0
18 Oct 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
527
0
13 Jun 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
42
159
0
30 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
32
400
0
07 Apr 2022
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
Chao Lou
Wenjuan Han
Yuh-Chen Lin
Zilong Zheng
CoGe
23
10
0
27 Mar 2022
Attention cannot be an Explanation
Arjun Reddy Akula
Song-Chun Zhu
FAtt
XAI
15
6
0
26 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
23
3
0
24 Jan 2022
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Arjun Reddy Akula
Song-Chun Zhu
34
3
0
17 Jan 2022
Effective Representation to Capture Collaboration Behaviors between Explainer and User
Arjun Reddy Akula
Song-Chun Zhu
24
4
0
10 Jan 2022
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
26
109
0
14 Dec 2021
Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data
Rui Hu
Jitao Sang
Jinqiang Wang
Rui Hu
Chaoquan Jiang
CML
OOD
24
7
0
17 Nov 2021
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
Arjun Reddy Akula
Keze Wang
Changsong Liu
Sari Saba-Sadiya
Hongjing Lu
S. Todorovic
J. Chai
Song-Chun Zhu
31
47
0
03 Sep 2021
Neural Variational Learning for Grounded Language Acquisition
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLM
SSL
GAN
DRL
21
2
0
20 Jul 2021
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
27
114
0
16 Jun 2021
Grounding 'Grounding' in NLP
Khyathi Raghavi Chandu
Yonatan Bisk
A. Black
30
51
0
04 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
26
4
0
12 May 2021
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives
Nils Rethmeier
Isabelle Augenstein
SSL
VLM
90
90
0
25 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Lifu Tu
Garima Lalwani
Spandana Gella
He He
LRM
21
184
0
14 Jul 2020
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney
Kushal Kafle
Robik Shrestha
Ehsan Abbasnejad
Christopher Kanan
Anton Van Den Hengel
OODD
OOD
24
145
0
19 May 2020
Unshuffling Data for Improved Generalization
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
25
76
0
27 Feb 2020
Discourse Parsing in Videos: A Multi-modal Appraoch
Arjun Reddy Akula
Song-Chun Zhu
19
1
0
06 Mar 2019
1