ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.05099
  4. Cited By
Yin and Yang: Balancing and Answering Binary Visual Questions

Yin and Yang: Balancing and Answering Binary Visual Questions

16 November 2015
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "Yin and Yang: Balancing and Answering Binary Visual Questions"

50 / 203 papers shown
Title
AlignVE: Visual Entailment Recognition Based on Alignment Relations
AlignVE: Visual Entailment Recognition Based on Alignment Relations
Biwei Cao
Jiuxin Cao
Jie Gui
Jiayun Shen
Bo Liu
Lei He
Yuan Yan Tang
James T. Kwok
26
7
0
16 Nov 2022
Understanding ME? Multimodal Evaluation for Fine-grained Visual
  Commonsense
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Zhecan Wang
Haoxuan You
Yicheng He
Wenhao Li
Kai-Wei Chang
Shih-Fu Chang
23
5
0
10 Nov 2022
Learn to Explain: Multimodal Reasoning via Thought Chains for Science
  Question Answering
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
Ashwin Kalyan
ELM
ReLM
LRM
211
1,124
0
20 Sep 2022
Overcoming Language Priors in Visual Question Answering via
  Distinguishing Superficially Similar Instances
Overcoming Language Priors in Visual Question Answering via Distinguishing Superficially Similar Instances
Yike Wu
Yu Zhao
Shiwan Zhao
Ying Zhang
Xiaojie Yuan
Guoqing Zhao
Ning Jiang
90
17
0
18 Sep 2022
WildQA: In-the-Wild Video Question Answering
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
76
7
0
14 Sep 2022
Generative Bias for Robust Visual Question Answering
Generative Bias for Robust Visual Question Answering
Jae-Won Cho
Dong-Jin Kim
H. Ryu
In So Kweon
OOD
CML
36
19
0
01 Aug 2022
Rethinking Data Augmentation for Robust Visual Question Answering
Rethinking Data Augmentation for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Jun Xiao
OOD
35
42
0
18 Jul 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering
Clotho-AQA: A Crowdsourced Dataset for Audio Question Answering
Samuel Lipping
Parthasaarathy Sudarsanam
K. Drossos
Tuomas Virtanen
19
54
0
20 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
33
20
0
05 Apr 2022
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Learning to Answer Questions in Dynamic Audio-Visual Scenarios
Guangyao Li
Yake Wei
Yapeng Tian
Chenliang Xu
Ji-Rong Wen
Di Hu
39
136
0
26 Mar 2022
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
CARETS: A Consistency And Robustness Evaluative Test Suite for VQA
Carlos E. Jimenez
Olga Russakovsky
Karthik Narasimhan
CoGe
29
14
0
15 Mar 2022
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations
  (CGL)
Catch Me if You Can: A Novel Task for Detection of Covert Geo-Locations (CGL)
Binoy Saha
Sukhendu Das
22
1
0
05 Feb 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
COIN: Counterfactual Image Generation for VQA Interpretation
COIN: Counterfactual Image Generation for VQA Interpretation
Zeyd Boukhers
Timo Hartmann
Jan Jurjens
21
7
0
10 Jan 2022
General Greedy De-bias Learning
General Greedy De-bias Learning
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
11
7
0
20 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
19
33
0
17 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
  Reasoning
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
15
30
0
16 Dec 2021
3D Question Answering
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
31
47
0
15 Dec 2021
Searching the Search Space of Vision Transformer
Searching the Search Space of Vision Transformer
Minghao Chen
Kan Wu
Bolin Ni
Houwen Peng
Bei Liu
Jianlong Fu
Hongyang Chao
Haibin Ling
ViT
42
52
0
29 Nov 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual
  Language Reasoning
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
41
184
0
25 Oct 2021
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual
  Transformers with Joint Student-Teacher Learning
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Ankit Parag Shah
Shijie Geng
Peng Gao
A. Cherian
Takaaki Hori
Tim K. Marks
Jonathan Le Roux
Chiori Hori
29
22
0
13 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual
  Question Answering
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
21
36
0
03 Oct 2021
Multimodal Integration of Human-Like Attention in Visual Question
  Answering
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
41
16
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual
  Question Answering
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
11
20
0
05 Sep 2021
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for
  Vision-and-Language Navigation in Continuous Environments
SASRA: Semantically-aware Spatio-temporal Reasoning Agent for Vision-and-Language Navigation in Continuous Environments
Muhammad Zubair Irshad
Niluthpol Chowdhury Mithun
Zachary Seymour
Han-Pang Chiu
S. Samarasekera
Rakesh Kumar
LM&Ro
26
49
0
26 Aug 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
26
75
0
27 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
  in Visual Question Answering
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
36
19
0
24 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
44
0
0
07 Jul 2021
$C^3$: Compositional Counterfactual Contrastive Learning for
  Video-grounded Dialogues
C3C^3C3: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
27
2
0
16 Jun 2021
NAAQA: A Neural Architecture for Acoustic Question Answering
NAAQA: A Neural Architecture for Acoustic Question Answering
Jerome Abdelnour
Jean Rouat
G. Salvi
6
4
0
11 Jun 2021
Check It Again: Progressive Visual Question Answering via Visual
  Entailment
Check It Again: Progressive Visual Question Answering via Visual Entailment
Q. Si
Zheng Lin
Mingyu Zheng
Peng Fu
Weiping Wang
25
48
0
08 Jun 2021
A survey on VQA_Datasets and Approaches
A survey on VQA_Datasets and Approaches
Yeyun Zou
Qiyu Xie
45
18
0
02 May 2021
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Neuro-Symbolic VQA: A review from the perspective of AGI desiderata
Ian Berlot-Attwell
16
3
0
13 Apr 2021
MultiModalQA: Complex Question Answering over Text, Tables and Images
MultiModalQA: Complex Question Answering over Text, Tables and Images
Alon Talmor
Ori Yoran
Amnon Catav
Dan Lahav
Yizhong Wang
Akari Asai
Gabriel Ilharco
Hannaneh Hajishirzi
Jonathan Berant
LMTD
32
150
0
13 Apr 2021
On Semantic Similarity in Video Retrieval
On Semantic Similarity in Video Retrieval
Michael Wray
Hazel Doughty
Dima Damen
33
66
0
18 Mar 2021
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual
  Descriptions
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
Sebastian Bujwid
Josephine Sullivan
VLM
23
28
0
17 Mar 2021
Automatic Generation of Contrast Sets from Scene Graphs: Probing the
  Compositional Consistency of GQA
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQA
Yonatan Bitton
Gabriel Stanovsky
Roy Schwartz
Michael Elhadad
CoGe
25
33
0
17 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal
  Tasks with Language and Vision
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
35
37
0
06 Mar 2021
Rissanen Data Analysis: Examining Dataset Characteristics via
  Description Length
Rissanen Data Analysis: Examining Dataset Characteristics via Description Length
Ethan Perez
Douwe Kiela
Kyunghyun Cho
30
24
0
05 Mar 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical
  Visual Question Answering
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu
Li-Ming Zhan
Li Xu
Lin Ma
Y. Yang
Xiao-Ming Wu
42
236
0
18 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention
  Regularization Approach
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Yebin Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
29
28
0
03 Feb 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Fei Wu
Rui Yan
Jiwei Li
27
28
0
30 Dec 2020
Object-Centric Diagnosis of Visual Reasoning
Object-Centric Diagnosis of Visual Reasoning
Jianwei Yang
Jiayuan Mao
Jiajun Wu
Devi Parikh
David D. Cox
J. Tenenbaum
Chuang Gan
OCL
27
16
0
21 Dec 2020
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
CRAFT: A Benchmark for Causal Reasoning About Forces and inTeractions
Tayfun Ates
Muhammed Samil Atesoglu
Cagatay Yigit
.Ilker Kesen
Mert Kobaş
Erkut Erdem
Aykut Erdem
T. Goksun
Deniz Yuret
27
31
0
08 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
25
35
0
04 Dec 2020
Multi-Label Contrastive Learning for Abstract Visual Reasoning
Multi-Label Contrastive Learning for Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
8
40
0
03 Dec 2020
Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations
  in 3D
Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
Ankit Goyal
Kaiyu Yang
Dawei Yang
Jia Deng
25
41
0
03 Dec 2020
Previous
12345
Next