Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
v1
v2
v3 (latest)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 2,037 papers shown
Title
MuMuQA: Multimedia Multi-Hop News Question Answering via Cross-Media Knowledge Extraction and Grounding
Revanth Reddy Gangi Reddy
Xilin Rui
Manling Li
Xudong Lin
Haoyang Wen
...
Joey Tianyi Zhou
Avirup Sil
Shih-Fu Chang
Alex Schwing
Heng Ji
82
32
0
20 Dec 2021
General Greedy De-bias Learning
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Qi Tian
119
9
0
20 Dec 2021
ScanQA: 3D Question Answering for Spatial Scene Understanding
Daich Azuma
Taiki Miyanishi
Shuhei Kurita
M. Kawanabe
115
208
0
20 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
53
34
0
17 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLM
FedML
92
33
0
16 Dec 2021
SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning
Zhecan Wang
Haoxuan You
Liunian Harold Li
Alireza Zareian
Suji Park
Yiqing Liang
Kai-Wei Chang
Shih-Fu Chang
ReLM
LRM
69
33
0
16 Dec 2021
3D Question Answering
Shuquan Ye
Dongdong Chen
Songfang Han
Jing Liao
ViT
94
49
0
15 Dec 2021
Dual-Key Multimodal Backdoors for Visual Question Answering
Matthew Walmer
Karan Sikka
Indranil Sur
Abhinav Shrivastava
Susmit Jha
AAML
78
38
0
14 Dec 2021
VALSE: A Task-Independent Benchmark for Vision and Language Models Centered on Linguistic Phenomena
Letitia Parcalabescu
Michele Cafagna
Lilitta Muradjan
Anette Frank
Iacer Calixto
Albert Gatt
CoGe
110
118
0
14 Dec 2021
Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering
Jianjian Cao
Xiameng Qin
Sanyuan Zhao
Jianbing Shen
77
21
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
130
360
0
13 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
88
10
0
10 Dec 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
Yining Hong
Li Yi
J. Tenenbaum
Antonio Torralba
Chuang Gan
76
40
0
09 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
159
719
0
08 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
66
6
0
08 Dec 2021
Joint Learning of Localized Representations from Medical Images and Reports
Philipp Muller
Georgios Kaissis
Cong Zou
Daniel Munich
225
87
0
06 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
132
133
0
02 Dec 2021
Querying Labelled Data with Scenario Programs for Sim-to-Real Validation
Edward Kim
Jay Shenoy
Sebastian Junges
Daniel J. Fremont
Alberto L. Sangiovanni-Vincentelli
Sanjit A. Seshia
81
3
0
01 Dec 2021
Classification-Regression for Chart Comprehension
Matan Levy
Rami Ben-Ari
Dani Lischinski
67
16
0
29 Nov 2021
Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction
Wentao He
Jianfeng Ren
Ruibin Bai
Xudong Jiang
LRM
70
5
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
213
907
0
22 Nov 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
110
0
0
22 Nov 2021
TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning
Keng Ji Chow
Samson Tan
MingSung Kan
LRM
65
4
0
21 Nov 2021
Medical Visual Question Answering: A Survey
Zhihong Lin
Donghao Zhang
Qingyi Tao
Danli Shi
Gholamreza Haffari
Qi Wu
M. He
Z. Ge
116
122
0
19 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
78
57
0
19 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
106
308
0
16 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
200
561
0
15 Nov 2021
Sentiment Analysis of Fashion Related Posts in Social Media
Yifei Yuan
W. Lam
75
8
0
15 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
207
356
0
11 Nov 2021
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture
Michael Yang
Aditya Anantharaman
Zach Kitowski
Derik Clive Robert
ViT
64
4
0
11 Nov 2021
Towards Debiasing Temporal Sentence Grounding in Video
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
103
16
0
08 Nov 2021
VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Hangbo Bao
Wenhui Wang
Li Dong
Qiang Liu
Owais Khan Mohammed
Kriti Aggarwal
Subhojit Som
Furu Wei
VLM
MLLM
MoE
106
560
0
03 Nov 2021
Introspective Distillation for Robust Question Answering
Yulei Niu
Hanwang Zhang
94
60
0
01 Nov 2021
Perceptual Score: What Data Modalities Does Your Model Perceive?
Itai Gat
Idan Schwartz
Alex Schwing
99
32
0
27 Oct 2021
IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning
Pan Lu
Liang Qiu
Jiaqi Chen
Tony Xia
Yizhou Zhao
Wei Zhang
Zhou Yu
Xiaodan Liang
Song-Chun Zhu
AIMat
177
206
0
25 Oct 2021
Alignment Attention by Matching Key and Query Distributions
Shujian Zhang
Xinjie Fan
Huangjie Zheng
Korawat Tanwisuth
Mingyuan Zhou
OOD
124
10
0
25 Oct 2021
Robustness through Data Augmentation Loss Consistency
Tianjian Huang
Shaunak Halbe
Chinnadhurai Sankar
P. Amini
Satwik Kottur
A. Geramifard
Meisam Razaviyayn
Ahmad Beirami
OOD
135
8
0
21 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
63
12
0
21 Oct 2021
Domain Generalization through Audio-Visual Relative Norm Alignment in First Person Action Recognition
M. Planamente
Chiara Plizzari
Emanuele Alberti
Barbara Caputo
EgoV
121
35
0
19 Oct 2021
Towards Language-guided Visual Recognition via Dynamic Convolutions
Gen Luo
Yiyi Zhou
Xiaoshuai Sun
Yongjian Wu
Yue Gao
Rongrong Ji
ObjD
98
19
0
17 Oct 2021
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
Woojeong Jin
Yu Cheng
Yelong Shen
Weizhu Chen
Xiang Ren
VLM
VPVLM
MLLM
128
138
0
16 Oct 2021
The World of an Octopus: How Reporting Bias Influences a Language Model's Perception of Color
Cory Paik
Stéphane Aroca-Ouellette
Alessandro Roncone
Katharina Kann
71
34
0
15 Oct 2021
Content Preserving Image Translation with Texture Co-occurrence and Spatial Self-Similarity for Texture Debiasing and Domain Adaptation
Hao Li
Dongkyu Won
Wei Lu
Philip Chikontwe
Pengjun Xie
June Hong Ahn
Sang Hyun Park
116
37
0
15 Oct 2021
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
126
17
0
14 Oct 2021
Audio-Visual Scene-Aware Dialog and Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning
Ankit Parag Shah
Shijie Geng
Peng Gao
A. Cherian
Takaaki Hori
Tim K. Marks
Jonathan Le Roux
Chiori Hori
68
24
0
13 Oct 2021
Improving Users' Mental Model with Attention-directed Counterfactual Edits
Kamran Alipour
Arijit Ray
Xiaoyu Lin
Michael Cogswell
J. Schulze
Yi Yao
Giedrius Burachas
OOD
61
9
0
13 Oct 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Dirk Vath
Pascal Tilli
Ngoc Thang Vu
87
4
0
11 Oct 2021
Coarse-to-Fine Reasoning for Visual Question Answering
Binh X. Nguyen
Tuong Khanh Long Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
NAI
138
40
0
06 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning
Ali Furkan Biten
L. G. I. Bigorda
Dimosthenis Karatzas
168
63
0
04 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
119
37
0
03 Oct 2021
Previous
1
2
3
...
29
30
31
...
39
40
41
Next