Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
v1
v2
v3 (latest)
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 2,037 papers shown
Title
Asking questions on handwritten document collections
Minesh Mathew
Lluís Gómez
Dimosthenis Karatzas
C. V. Jawahar
RALM
130
11
0
02 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li
Elias Stengel-Eskin
Yixiao Zhang
Cihang Xie
Q. Tran
Benjamin Van Durme
Alan Yuille
VLM
73
15
0
01 Oct 2021
Visually Grounded Reasoning across Languages and Cultures
Fangyu Liu
Emanuele Bugliarello
Edoardo Ponti
Siva Reddy
Nigel Collier
Desmond Elliott
VLM
LRM
175
180
0
28 Sep 2021
Multimodal Integration of Human-Like Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Philippe Muller
Dominike Thomas
Mihai Bâce
Andreas Bulling
66
17
0
27 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
67
19
0
27 Sep 2021
Robustness and Sensitivity of BERT Models Predicting Alzheimer's Disease from Text
Jekaterina Novikova
OOD
79
10
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
129
11
0
24 Sep 2021
Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?
Tobias Norlund
Lovisa Hagström
Richard Johansson
72
25
0
23 Sep 2021
Learning Natural Language Generation from Scratch
Alice Martin Donati
Guillaume Quispe
Charles Ollion
Sylvain Le Corff
Florian Strub
Olivier Pietquin
LRM
58
4
0
20 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
103
61
0
15 Sep 2021
Broaden the Vision: Geo-Diverse Visual Commonsense Reasoning
Da Yin
Liunian Harold Li
Ziniu Hu
Nanyun Peng
Kai-Wei Chang
159
56
0
14 Sep 2021
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering
Jihyung Kil
Cheng Zhang
D. Xuan
Wei-Lun Chao
116
20
0
13 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
299
423
0
10 Sep 2021
Towards Developing a Multilingual and Code-Mixed Visual Question Answering System by Knowledge Distillation
H. Khan
D. Gupta
Asif Ekbal
57
14
0
10 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
41
2
0
09 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
105
74
0
06 Sep 2021
Improved RAMEN: Towards Domain Generalization for Visual Question Answering
Bhanuka Gamage
Lim Chern Hong
77
1
0
06 Sep 2021
Data Efficient Masked Language Modeling for Vision and Language
Yonatan Bitton
Gabriel Stanovsky
Michael Elhadad
Roy Schwartz
VLM
82
20
0
05 Sep 2021
Weakly Supervised Relative Spatial Reasoning for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
85
19
0
04 Sep 2021
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
91
87
0
01 Sep 2021
Behind the Scenes: An Exploration of Trigger Biases Problem in Few-Shot Event Classification
Peiyi Wang
Runxin Xu
Tianyu Liu
Damai Dai
Baobao Chang
Zhifang Sui
77
16
0
29 Aug 2021
On the Significance of Question Encoder Sequence Model in the Out-of-Distribution Performance in Visual Question Answering
K. Gouthaman
Anurag Mittal
CML
79
0
0
28 Aug 2021
QACE: Asking Questions to Evaluate an Image Caption
Hwanhee Lee
Thomas Scialom
Seunghyun Yoon
Franck Dernoncourt
Kyomin Jung
CoGe
89
19
0
28 Aug 2021
Automated Generation of Accurate \& Fluent Medical X-ray Reports
Hoang T.N. Nguyen
Dong Nie
Taivanbat Badamdorj
Yujie Liu
Yingying Zhu
J. Truong
Li Cheng
MedIm
LM&MA
73
40
0
27 Aug 2021
Vision-Language Navigation: A Survey and Taxonomy
Wansen Wu
Tao Chang
Xinmeng Li
LM&Ro
81
24
0
26 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
183
801
0
24 Aug 2021
EKTVQA: Generalized use of External Knowledge to empower Scene Text in Text-VQA
Arka Ujjal Dey
Ernest Valveny
Gaurav Harit
47
3
0
22 Aug 2021
The Multi-Modal Video Reasoning and Analyzing Competition
Haoran Peng
He Huang
Li Xu
Tianjiao Li
Jing Liu
...
Yuanzhong Liu
Tao He
Fuwei Zhang
Xianbin Liu
Tao Lin
54
2
0
18 Aug 2021
ROSITA: Enhancing Vision-and-Language Semantic Alignments via Cross- and Intra-modal Knowledge Integration
Yuhao Cui
Zhou Yu
Chunqi Wang
Zhongzhou Zhao
Ji Zhang
Meng Wang
Jun-chen Yu
VLM
77
56
0
16 Aug 2021
Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning
Bryan Wang
Gang Li
Xin Zhou
Zhourong Chen
Tovi Grossman
Yang Li
223
160
0
07 Aug 2021
Sparse Continuous Distributions and Fenchel-Young Losses
André F. T. Martins
Marcos Vinícius Treviso
António Farinhas
P. Aguiar
Mário A. T. Figueiredo
Mathieu Blondel
Vlad Niculae
76
12
0
04 Aug 2021
ReFormer: The Relational Transformer for Image Captioning
Xuewen Yang
Yingru Liu
Xin Wang
ViT
103
57
0
29 Jul 2021
Greedy Gradient Ensemble for Robust Visual Question Answering
Xinzhe Han
Shuhui Wang
Chi Su
Qingming Huang
Q. Tian
65
78
0
27 Jul 2021
X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization in Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Yifan Liu
Zhixiong Nan
N. Zheng
OOD
89
19
0
24 Jul 2021
Multi-stage Pre-training over Simplified Multimodal Pre-training Models
Tongtong Liu
Fangxiang Feng
Xiaojie Wang
43
13
0
22 Jul 2021
Neural Abstructions: Abstractions that Support Construction for Grounded Language Learning
Kaylee Burns
Christopher D. Manning
Li Fei-Fei
58
0
0
20 Jul 2021
Separating Skills and Concepts for Novel Visual Question Answering
Spencer Whitehead
Hui Wu
Heng Ji
Rogerio Feris
Kate Saenko
CoGe
100
34
0
19 Jul 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
358
1,991
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
111
172
0
15 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
278
412
0
13 Jul 2021
Deep Learning for Embodied Vision Navigation: A Survey
Fengda Zhu
Yi Zhu
Vincent CS Lee
Xiaodan Liang
Xiaojun Chang
EgoV
LM&Ro
101
0
0
07 Jul 2021
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question Answering
Siddharth Karamcheti
Ranjay Krishna
Li Fei-Fei
Christopher D. Manning
99
92
0
06 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
272
792
0
25 Jun 2021
Probing Inter-modality: Visual Parsing with Self-Attention for Vision-Language Pre-training
Hongwei Xue
Yupan Huang
Bei Liu
Houwen Peng
Jianlong Fu
Houqiang Li
Jiebo Luo
101
89
0
25 Jun 2021
A Picture May Be Worth a Hundred Words for Visual Question Answering
Yusuke Hirota
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
Ittetsu Taniguchi
Takao Onoye
ViT
35
4
0
25 Jun 2021
A Transformer-based Cross-modal Fusion Model with Adversarial Training for VQA Challenge 2021
Keda Lu
Bo Fang
Kuan-Yu Chen
ViT
47
2
0
24 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
82
54
0
19 Jun 2021
Probing Image-Language Transformers for Verb Understanding
Lisa Anne Hendricks
Aida Nematzadeh
90
119
0
16 Jun 2021
C
3
C^3
C
3
: Compositional Counterfactual Contrastive Learning for Video-grounded Dialogues
Hung Le
Nancy F. Chen
Guosheng Lin
67
2
0
16 Jun 2021
Examining and Combating Spurious Features under Distribution Shift
Chunting Zhou
Xuezhe Ma
Paul Michel
Graham Neubig
OOD
96
68
0
14 Jun 2021
Previous
1
2
3
...
30
31
32
...
39
40
41
Next