Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.05099
Cited By
Yin and Yang: Balancing and Answering Binary Visual Questions
16 November 2015
Peng Zhang
Yash Goyal
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Yin and Yang: Balancing and Answering Binary Visual Questions"
50 / 203 papers shown
Title
BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
Shengao Wang
Arjun Chandra
Aoming Liu
Venkatesh Saligrama
Boqing Gong
MLLM
VLM
47
0
0
13 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
30
0
0
09 Apr 2025
Talking to the brain: Using Large Language Models as Proxies to Model Brain Semantic Representation
Xin Liu
Zhe Zhang
Jingxin Nie
67
0
0
26 Feb 2025
FilterRAG: Zero-Shot Informed Retrieval-Augmented Generation to Mitigate Hallucinations in VQA
S M Sarwar
80
1
0
25 Feb 2025
Beyond Benchmarks: On The False Promise of AI Regulation
Gabriel Stanovsky
Renana Keydar
Gadi Perl
Eliya Habba
41
1
0
28 Jan 2025
MASS: Overcoming Language Bias in Image-Text Matching
Jiwan Chung
Seungwon Lim
Sangkyu Lee
Youngjae Yu
VLM
32
0
0
20 Jan 2025
What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Candace Ross
Melissa Hall
Adriana Romero Soriano
Adina Williams
95
3
0
18 Dec 2024
Task Progressive Curriculum Learning for Robust Visual Question Answering
Ahmed Akl
Abdelwahed Khamis
Zhe Wang
Ali Cheraghian
Sara Khalifa
Kewen Wang
OOD
83
0
0
26 Nov 2024
A Comprehensive Survey on Visual Question Answering Datasets and Algorithms
Raihan Kabir
Naznin Haque
Md. Saiful Islam
Marium-E. Jannat
CoGe
29
1
0
17 Nov 2024
Right this way: Can VLMs Guide Us to See More to Answer Questions?
Li Liu
Diji Yang
Sijia Zhong
Kalyana Suma Sree Tholeti
Lei Ding
Yi Zhang
Leilani H. Gilpin
39
2
0
01 Nov 2024
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset
Ngoc Dung Huynh
Mohamed Reda Bouadjenek
Sunil Aryal
Imran Razzak
Hakim Hacid
31
0
0
30 Oct 2024
Dreaming Out Loud: A Self-Synthesis Approach For Training Vision-Language Models With Developmentally Plausible Data
Badr AlKhamissi
Yingtian Tang
Abdülkadir Gökce
Johannes Mehrer
Martin Schrimpf
VLM
49
0
0
29 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
28
7
0
14 Oct 2024
Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention
Ying Liu
Ge Bai
Chenji Lu
Shilong Li
Zhang Zhang
Ruifang Liu
Wenbin Guo
21
0
0
14 Oct 2024
QTG-VQA: Question-Type-Guided Architectural for VideoQA Systems
Zhixian He
Pengcheng Zhao
Fuwei Zhang
Shujin Lin
41
0
0
14 Sep 2024
Evaluating Attribute Comprehension in Large Vision-Language Models
Haiwen Zhang
Zixi Yang
Yuanzhi Liu
Xinran Wang
Zheqi He
Kongming Liang
Zhanyu Ma
ELM
37
0
0
25 Aug 2024
Revisiting Multi-Modal LLM Evaluation
Jian Lu
Shikhar Srivastava
Junyu Chen
Robik Shrestha
Manoj Acharya
Kushal Kafle
Christopher Kanan
30
3
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas J. Guibas
P. Milanfar
Feng Yang
48
2
0
07 Aug 2024
Fairness and Bias Mitigation in Computer Vision: A Survey
Sepehr Dehdashtian
Ruozhen He
Yi Li
Guha Balakrishnan
Nuno Vasconcelos
Vicente Ordonez
Vishnu Naresh Boddeti
40
4
0
05 Aug 2024
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
54
0
0
23 Jul 2024
On the Role of Visual Grounding in VQA
Daniel Reich
Tanja Schultz
21
1
0
26 Jun 2024
FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture
Wenyan Li
Xinyu Crystina Zhang
Jiaang Li
Qiwei Peng
Raphael Tang
...
Guimin Hu
Yifei Yuan
Anders Søgaard
Daniel Hershcovich
Desmond Elliott
CoGe
35
7
0
16 Jun 2024
3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset
Xinyu Ma
Xuebo Liu
Derek F. Wong
Jun Rao
Bei Li
Liang Ding
Lidia S. Chao
Dacheng Tao
Min Zhang
41
2
0
29 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
50
1
0
01 Apr 2024
Few-Shot VQA with Frozen LLMs: A Tale of Two Approaches
Igor Sterner
Weizhe Lin
Jinghong Chen
Bill Byrne
25
2
0
17 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLM
ObjD
53
29
0
04 Mar 2024
Measuring Vision-Language STEM Skills of Neural Models
Jianhao Shen
Ye Yuan
Srbuhi Mirzoyan
Ming Zhang
Chenguang Wang
VLM
33
8
0
27 Feb 2024
Multimodal Transformer With a Low-Computational-Cost Guarantee
Sungjin Park
Edward Choi
52
1
0
23 Feb 2024
Proximity QA: Unleashing the Power of Multi-Modal Large Language Models for Spatial Proximity Analysis
Jianing Li
Xi Nan
Ming Lu
Li Du
Shanghang Zhang
50
1
0
31 Jan 2024
MIVC: Multiple Instance Visual Component for Visual-Language Models
Wenyi Wu
Qi Li
Leon Wenliang Zhong
Junzhou Huang
33
3
0
28 Dec 2023
Understanding Unimodal Bias in Multimodal Deep Linear Networks
Yedi Zhang
Peter E. Latham
Andrew Saxe
31
6
0
01 Dec 2023
Debiasing Multimodal Models via Causal Information Minimization
Vaidehi Patil
A. Maharana
Mohit Bansal
CML
38
2
0
28 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
43
36
0
01 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
30
9
0
25 Oct 2023
Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and Beyond
Zhecan Wang
Long Chen
Haoxuan You
Keyang Xu
Yicheng He
Wenhao Li
Noal Codella
Kai-Wei Chang
Shih-Fu Chang
27
3
0
23 Oct 2023
UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models
Yanyang Guo
Fangkai Jiao
Zhiqi Shen
Liqiang Nie
Mohan S. Kankanhalli
MLLM
30
5
0
17 Oct 2023
Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
Xiulong Liu
Zhikang Dong
Peng Zhang
24
21
0
10 Oct 2023
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
33
7
0
16 Aug 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
29
18
0
21 Jul 2023
Read, Look or Listen? What's Needed for Solving a Multimodal Dataset
Netta Madvil
Yonatan Bitton
Roy Schwartz
30
2
0
06 Jul 2023
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng
Hanbo Zhang
Jiani Zheng
Jiangnan Xia
Guoqiang Wei
Yang Wei
Yuchen Zhang
Tao Kong
MLLM
27
71
0
05 Jul 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
16
9
0
26 May 2023
Evaluating Object Hallucination in Large Vision-Language Models
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
119
699
0
17 May 2023
Fairness in AI Systems: Mitigating gender bias from language-vision models
Lavisha Aggarwal
Shruti Bhargava
19
4
0
03 May 2023
Visual Reasoning: from State to Transformation
Xin Hong
Yanyan Lan
Liang Pang
J. Guo
Xueqi Cheng
LRM
16
3
0
02 May 2023
RoCOCO: Robustness Benchmark of MS-COCO to Stress-test Image-Text Matching Models
Seulki Park
Daeho Um
Hajung Yoon
Sanghyuk Chun
Sangdoo Yun
Jin Young Choi
38
2
0
21 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
M. Worring
OOD
AAML
44
5
0
06 Apr 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
121
66
0
13 Mar 2023
MAQA: A Multimodal QA Benchmark for Negation
Judith Yue Li
Aren Jansen
Qingqing Huang
Joonseok Lee
Ravi Ganti
Dima Kuzmin
33
5
0
09 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
33
16
0
26 Dec 2022
1
2
3
4
5
Next