Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1902.05660
Cited By
Cycle-Consistency for Robust Visual Question Answering
15 February 2019
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
OOD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Cycle-Consistency for Robust Visual Question Answering"
41 / 41 papers shown
Title
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
82
8
0
21 Feb 2025
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang
Jiawei Kong
Wenbo Yu
Bin Chen
Jiawei Li
Hao Wu
Ke Xu
Ke Xu
AAML
VLM
40
13
0
08 Jun 2024
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Jiaheng Liu
15
1
0
19 May 2023
Iterative Adversarial Attack on Image-guided Story Ending Generation
Youze Wang
Wenbo Hu
Richang Hong
32
3
0
16 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
41
34
0
05 May 2023
Why Did the Chicken Cross the Road? Rephrasing and Analyzing Ambiguous Questions in VQA
Elias Stengel-Eskin
Jimena Guallar-Blasco
Yi Zhou
Benjamin Van Durme
UQLM
32
11
0
14 Nov 2022
ULN: Towards Underspecified Vision-and-Language Navigation
Weixi Feng
Tsu-jui Fu
Yujie Lu
William Yang Wang
49
5
0
18 Oct 2022
Selecting Better Samples from Pre-trained LLMs: A Case Study on Question Generation
Xingdi Yuan
Tong Wang
Yen-Hsiang Wang
Emery Fine
Rania Abdelghani
Pauline Lucas
Hélene Sauzéon
Pierre-Yves Oudeyer
30
29
0
22 Sep 2022
Hierarchical Local-Global Transformer for Temporal Sentence Grounding
Xiang Fang
Daizong Liu
Pan Zhou
Zichuan Xu
Rui Li
19
28
0
31 Aug 2022
A Feature-space Multimodal Data Augmentation Technique for Text-video Retrieval
Alex Falcon
G. Serra
Oswald Lanz
VGen
42
25
0
03 Aug 2022
Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
MedIm
24
12
0
27 Jun 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
34
33
0
10 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
21
36
0
03 Oct 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
41
16
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
61
20
0
13 Sep 2021
Pulling Up by the Causal Bootstraps: Causal Data Augmentation for Pre-training Debiasing
Sindhu C. M. Gowda
Shalmali Joshi
Haoran Zhang
Marzyeh Ghassemi
CML
32
8
0
27 Aug 2021
BiaSwap: Removing dataset bias with bias-tailored swapping augmentation
Eungyeup Kim
Jihyeon Janel Lee
Jaegul Choo
27
86
0
23 Aug 2021
The Spotlight: A General Method for Discovering Systematic Errors in Deep Learning Models
G. dÉon
Jason dÉon
J. R. Wright
Kevin Leyton-Brown
25
74
0
01 Jul 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
39
16
0
08 Jun 2021
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Xiaofei Ma
Cicero Nogueira dos Santos
Andrew O. Arnold
15
20
0
27 May 2021
gComm: An environment for investigating generalization in Grounded Language Acquisition
Rishi Hazra
Sonu Dixit
31
0
0
09 May 2021
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Yanai Elazar
Hongming Zhang
Yoav Goldberg
Dan Roth
ReLM
LRM
45
44
0
16 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
28
76
0
07 Apr 2021
Intrinsically Motivated Compositional Language Emergence
Rishi Hazra
Sonu Dixit
Sayambhu Sen
11
1
0
09 Dec 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
20
168
0
01 Nov 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
115
31
0
16 Oct 2020
Weakly-supervised Learning of Human Dynamics
Petrissa Zell
Bodo Rosenhahn
Bastian Wandt
19
13
0
17 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
28
5
0
08 Jul 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
35
488
0
11 Jun 2020
Towards Causal VQA: Revealing and Reducing Spurious Correlations by Invariant and Covariant Semantic Editing
Vedika Agarwal
Rakshith Shetty
Mario Fritz
CML
AAML
32
155
0
16 Dec 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
19
65
0
10 Sep 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
87
2,453
0
20 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
20
132
0
22 Jul 2019
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
32
159
0
24 May 2019
Adversarial Example Generation with Syntactically Controlled Paraphrase Networks
Mohit Iyyer
John Wieting
Kevin Gimpel
Luke Zettlemoyer
AAML
GAN
205
712
0
17 Apr 2018
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
104
3,126
0
02 Dec 2016
1