Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.05099
Cited By
Yin and Yang: Balancing and Answering Binary Visual Questions
16 November 2015
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Yin and Yang: Balancing and Answering Binary Visual Questions"
50 / 203 papers shown
Title
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
26
7
0
16 Nov 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
23
5
0
10 Nov 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,124
0
20 Sep 2022
Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances
Yike Wu
Yu Zhao
Shiwan Zhao
Ying Zhang
Xiaojie Yuan
Guoqing Zhao
Ning Jiang
90
17
0
18 Sep 2022
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
76
7
0
14 Sep 2022
Generative Bias for Robust Visual Question Answering
Jae-Won Cho
Dong-Jin Kim
H. Ryu
In So Kweon
OOD
CML
36
19
0
01 Aug 2022
Rethinking Data Augmentation for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Jun Xiao
OOD
35
42
0
18 Jul 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering
Samuel Lipping
Parthasaarathy Sudarsanam
K. Drossos
Tuomas Virtanen
19
54
0
20 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
33
20
0
05 Apr 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
39
136
0
26 Mar 2022
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
Carlos E. Jimenez
Olga Russakovsky
Karthik Narasimhan
CoGe
29
14
0
15 Mar 2022
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)
Binoy Saha
Sukhendu Das
22
1
0
05 Feb 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
COIN: Counterfactual Image Generation for VQA Interpretation
Zeyd Boukhers
Timo Hartmann
Jan Jurjens
21
7
0
10 Jan 2022
General Greedy De-bias Learning
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
11
7
0
20 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
19
33
0
17 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
15
30
0
16 Dec 2021
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
31
47
0
15 Dec 2021
Searching the Search Space of Vision Transformer
Minghao Chen
Kan Wu
Bolin Ni
Houwen Peng
Bei Liu
Jianlong Fu
Hongyang Chao
Haibin Ling
ViT
42
52
0
29 Nov 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
41
184
0
25 Oct 2021
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Ankit Parag Shah
Shijie Geng
Peng Gao
A. Cherian
Takaaki Hori
Tim K. Marks
Jonathan Le Roux
Chiori Hori
29
22
0
13 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
21
36
0
03 Oct 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
41
16
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
11
20
0
05 Sep 2021
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
Muhammad Zubair Irshad
Niluthpol Chowdhury Mithun
Zachary Seymour
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
LM&Ro
26
49
0
26 Aug 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
26
75
0
27 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
36
19
0
24 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
44
0
0
07 Jul 2021
C
3
C^3
C
3
: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
27
2
0
16 Jun 2021
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour
Jean Rouat
G. Salvi
6
4
0
11 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual Entailment
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
25
48
0
08 Jun 2021
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
45
18
0
02 May 2021
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
16
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
32
150
0
13 Apr 2021
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
Sebastian Bujwid
Josephine Sullivan
VLM
23
28
0
17 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
25
33
0
17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
Rissanen Data Analysis: Examining Dataset Characteristics via Description Length
Ethan Perez
Douwe Kiela
Kyunghyun Cho
30
24
0
05 Mar 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu
Li-Ming Zhan
Li Xu
Lin Ma
Y. Yang
Xiao-Ming Wu
42
236
0
18 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yebin Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
29
28
0
03 Feb 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Fei Wu
Rui Yan
Jiwei Li
27
28
0
30 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
27
16
0
21 Dec 2020
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
27
31
0
08 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
25
35
0
04 Dec 2020
Multi-Label Contrastive Learning for Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
8
40
0
03 Dec 2020
Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
Ankit Goyal
Kaiyu Yang
Dawei Yang
Jia Deng
25
41
0
03 Dec 2020
Previous
1
2
3
4
5
Next