ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.01655
  4. Cited By
Words aren't enough, their order matters: On the Robustness of Grounding
  Visual Referring Expressions

Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions

4 May 2020
Arjun Reddy Akula
Spandana Gella
Yaser Al-Onaizan
Song-Chun Zhu
Siva Reddy
    ObjD
ArXivPDFHTML

Papers citing "Words aren't enough, their order matters: On the Robustness of Grounding Visual Referring Expressions"

35 / 35 papers shown
Title
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
Xiaozhong Liu
Peng Wang
Guoqing Wang
Yuqing Yang
H. Shen
ObjD
94
0
0
27 Feb 2025
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
X. Yang
Weiwei Li
Peng Wang
ObjD
56
3
0
23 Sep 2024
Revisiting Multi-Modal LLM Evaluation
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
30
3
0
09 Aug 2024
How and where does CLIP process negation?
How and where does CLIP process negation?
Vincent Quantmeyer
Pablo Mosteiro
Albert Gatt
CoGe
29
6
0
15 Jul 2024
If CLIP Could Talk: Understanding Vision-Language Model Representations
  Through Their Preferred Concept Descriptions
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
Reza Esfandiarpoor
Cristina Menghini
Stephen H. Bach
CoGe
VLM
40
8
0
25 Mar 2024
Adversarial Testing for Visual Grounding via Image-Aware Property
  Reduction
Adversarial Testing for Visual Grounding via Image-Aware Property Reduction
Zhiyuan Chang
Mingyang Li
Junjie Wang
Cheng Li
Boyu Wu
Fanjiang Xu
Qing Wang
AAML
36
0
0
02 Mar 2024
VisoGender: A dataset for benchmarking gender bias in image-text pronoun
  resolution
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution
S. Hall
F. G. Abrantes
Hanwen Zhu
Grace A. Sodunke
Aleksandar Shtedritski
Hannah Rose Kirk
CoGe
23
39
0
21 Jun 2023
Scalable Performance Analysis for Vision-Language Models
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
35
1
0
30 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature
  Adaptation of Vision-Language Models
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLM
OffRL
15
4
0
28 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
33
22
0
12 May 2023
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
29
125
0
13 Dec 2022
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun
  Dependencies?
Do Vision-and-Language Transformers Learn Grounded Predicate-Noun Dependencies?
Mitja Nikolaus
Emmanuelle Salin
Stéphane Ayache
Abdellah Fourtassi
Benoit Favre
19
13
0
21 Oct 2022
ULN: Towards Underspecified Vision-and-Language Navigation
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-jui Fu
Yujie Lu
William Yang Wang
49
5
0
18 Oct 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
527
0
13 Jun 2022
Visual Spatial Reasoning
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
42
159
0
30 Apr 2022
Winoground: Probing Vision and Language Models for Visio-Linguistic
  Compositionality
Winoground: Probing Vision and Language Models for Visio-Linguistic Compositionality
Tristan Thrush
Ryan Jiang
Max Bartolo
Amanpreet Singh
Adina Williams
Douwe Kiela
Candace Ross
CoGe
32
400
0
07 Apr 2022
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene
  Graphs with Language Structures via Dependency Relationships
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
Chao Lou
Wenjuan Han
Yuh-Chen Lin
Zilong Zheng
CoGe
23
10
0
27 Mar 2022
Attention cannot be an Explanation
Attention cannot be an Explanation
Arjun Reddy Akula
Song-Chun Zhu
FAtt
XAI
15
6
0
26 Jan 2022
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal
  Grounding
Question Generation for Evaluating Cross-Dataset Shifts in Multi-modal Grounding
Arjun Reddy Akula
OOD
23
3
0
24 Jan 2022
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Discourse Analysis for Evaluating Coherence in Video Paragraph Captions
Arjun Reddy Akula
Song-Chun Zhu
34
3
0
17 Jan 2022
Effective Representation to Capture Collaboration Behaviors between
  Explainer and User
Effective Representation to Capture Collaboration Behaviors between Explainer and User
Arjun Reddy Akula
Song-Chun Zhu
24
4
0
10 Jan 2022
VALSE: A Task-Independent Benchmark for Vision and Language Models
  Centered on Linguistic Phenomena
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
26
109
0
14 Dec 2021
Understanding and Testing Generalization of Deep Networks on
  Out-of-Distribution Data
Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data
Rui Hu
Jitao Sang
Jinqiang Wang
Rui Hu
Chaoquan Jiang
CML
OOD
24
7
0
17 Nov 2021
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
  Human Trust in Image Recognition Models
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
Arjun Reddy Akula
Keze Wang
Changsong Liu
Sari Saba-Sadiya
Hongjing Lu
S. Todorovic
J. Chai
Song-Chun Zhu
31
47
0
03 Sep 2021
Neural Variational Learning for Grounded Language Acquisition
Neural Variational Learning for Grounded Language Acquisition
Nisha Pillai
Cynthia Matuszek
Francis Ferraro
VLM
SSL
GAN
DRL
21
2
0
20 Jul 2021
Probing Image-Language Transformers for Verb Understanding
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
27
114
0
16 Jun 2021
Grounding 'Grounding' in NLP
Grounding 'Grounding' in NLP
Khyathi Raghavi Chandu
Yonatan Bisk
A. Black
30
51
0
04 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language
  Matching
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
26
4
0
12 May 2021
A Primer on Contrastive Pretraining in Language Processing: Methods,
  Lessons Learned and Perspectives
A Primer on Contrastive Pretraining in Language Processing: Methods, Lessons Learned and Perspectives
Nils Rethmeier
Isabelle Augenstein
SSL
VLM
90
90
0
25 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression
  Grounding
Ref-NMS: Breaking Proposal Bottlenecks in Two-Stage Referring Expression Grounding
Long Chen
Wenbo Ma
Jun Xiao
Hanwang Zhang
Shih-Fu Chang
ObjD
17
89
0
03 Sep 2020
An Empirical Study on Robustness to Spurious Correlations using
  Pre-trained Language Models
An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models
Lifu Tu
Garima Lalwani
Spandana Gella
He He
LRM
21
184
0
14 Jul 2020
On the Value of Out-of-Distribution Testing: An Example of Goodhart's
  Law
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney
Kushal Kafle
Robik Shrestha
Ehsan Abbasnejad
Christopher Kanan
Anton Van Den Hengel
OODD
OOD
24
145
0
19 May 2020
Unshuffling Data for Improved Generalization
Unshuffling Data for Improved Generalization
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OOD
25
76
0
27 Feb 2020
Discourse Parsing in Videos: A Multi-modal Appraoch
Discourse Parsing in Videos: A Multi-modal Appraoch
Arjun Reddy Akula
Song-Chun Zhu
19
1
0
06 Mar 2019
1