Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.00837
Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"
50 / 1,977 papers shown
Title
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
51
11
0
19 Aug 2022
ILLUME: Rationalizing Vision-Language Models through Human Interactions
Manuel Brack
P. Schramowski
Bjorn Deiseroth
Kristian Kersting
VLM
MLLM
36
3
0
17 Aug 2022
Aesthetic Visual Question Answering of Photographs
Xin Jin
Wu Zhou
Xinghui Zhou
Shuai Cui
Le Zhang
Jianwen Lv
Shu Zhao
CoGe
31
0
0
10 Aug 2022
GRIT-VLP: Grouped Mini-batch Sampling for Efficient Vision and Language Pre-training
Jaeseok Byun
Taebaek Hwang
Jianlong Fu
Taesup Moon
VLM
28
11
0
08 Aug 2022
ChiQA: A Large Scale Image-based Real-World Question Answering Dataset for Multi-Modal Understanding
Bingning Wang
Feiya Lv
Ting Yao
Yiming Yuan
Jin Ma
Yu Luo
Haijin Liang
36
3
0
05 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLM
LRM
VPVLM
37
30
0
04 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
41
67
0
03 Aug 2022
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Jun Wang
M. Gao
Yuqian Hu
Ramprasaath R. Selvaraju
Chetan Ramaiah
Ran Xu
Joseph Jaja
Larry S. Davis
ViT
35
17
0
03 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
39
18
0
01 Aug 2022
Generative Bias for Robust Visual Question Answering
Jae-Won Cho
Dong-Jin Kim
H. Ryu
In So Kweon
OOD
CML
44
19
0
01 Aug 2022
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo
Jiali Duan
C.-C. Jay Kuo
J. Gichoya
Imon Banerjee
VLM
30
1
0
31 Jul 2022
Towards Complex Document Understanding By Discrete Reasoning
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
45
45
0
25 Jul 2022
Chunk-aware Alignment and Lexical Constraint for Visual Entailment with Natural Language Explanations
Qian Yang
Yunxin Li
Baotian Hu
Lin Ma
Yuxin Ding
Min Zhang
57
10
0
23 Jul 2022
Semantic-aware Modular Capsule Routing for Visual Question Answering
Yudong Han
Jianhua Yin
Jianlong Wu
Yin-wei Wei
Liqiang Nie
40
7
0
21 Jul 2022
Rethinking Data Augmentation for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Jun Xiao
OOD
40
42
0
18 Jul 2022
Unifying Event Detection and Captioning as Sequence Generation via Pre-Training
Qi Zhang
Yuqing Song
Qin Jin
30
24
0
18 Jul 2022
A Multibias-mitigated and Sentiment Knowledge Enriched Transformer for Debiasing in Multimodal Conversational Emotion Recognition
Jinglin Wang
Fang Ma
Yazhou Zhang
Dawei Song
21
4
0
17 Jul 2022
A Skeleton-aware Graph Convolutional Network for Human-Object Interaction Detection
Manli Zhu
Edmond S. L. Ho
Hubert P. H. Shum
3DH
42
3
0
11 Jul 2022
Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representation Learning and Retrieval
Keyu Wen
Zhenshan Tan
Qingrong Cheng
Cheng Chen
X. Gu
VLM
37
0
0
02 Jul 2022
Modern Question Answering Datasets and Benchmarks: A Survey
Zhen Wang
58
23
0
30 Jun 2022
EBMs vs. CL: Exploring Self-Supervised Visual Pretraining for Visual Question Answering
Violetta Shevchenko
Ehsan Abbasnejad
A. Dick
Anton Van Den Hengel
Damien Teney
53
0
0
29 Jun 2022
Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
MedIm
46
12
0
27 Jun 2022
CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks
Tejas Srinivasan
Ting-Yun Chang
Leticia Pinto-Alva
Georgios Chochlakis
Mohammad Rostami
Jesse Thomason
VLM
CLL
47
73
0
18 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
48
42
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
106
395
0
17 Jun 2022
BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning
Xiao Xu
Chenfei Wu
Shachar Rosenman
Vasudev Lal
Wanxiang Che
Nan Duan
51
64
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
35
85
0
16 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLM
AI4CE
39
16
0
15 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
33
82
0
14 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
47
238
0
13 Jun 2022
Language Models are General-Purpose Interfaces
Y. Hao
Haoyu Song
Li Dong
Shaohan Huang
Zewen Chi
Wenhui Wang
Shuming Ma
Furu Wei
MLLM
35
96
0
13 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
30
292
0
12 Jun 2022
A Unified Continuous Learning Framework for Multi-modal Knowledge Discovery and Pre-training
Zhihao Fan
Zhongyu Wei
Jingjing Chen
Siyuan Wang
Zejun Li
Jiarong Xu
Xuanjing Huang
CLL
22
6
0
11 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
29
111
0
07 Jun 2022
cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation
Kshitij Gupta
Devansh Gautam
R. Mamidi
VLM
41
3
0
07 Jun 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
32
62
0
04 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLM
MLLM
37
28
0
03 Jun 2022
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
46
158
0
03 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
21
521
0
03 Jun 2022
VL-BEiT: Generative Vision-Language Pretraining
Hangbo Bao
Wenhui Wang
Li Dong
Furu Wei
VLM
18
45
0
02 Jun 2022
Mitigating Dataset Bias by Using Per-sample Gradient
Sumyeong Ahn
Seongyoon Kim
Se-Young Yun
54
21
0
31 May 2022
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
65
13
0
30 May 2022
UPB at SemEval-2022 Task 5: Enhancing UNITER with Image Sentiment and Graph Convolutional Networks for Multimedia Automatic Misogyny Identification
Andrei Paraschiv
M. Dascalu
Dumitru-Clementin Cercel
32
3
0
29 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
71
533
0
27 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
123
36
0
25 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
26
12
0
25 May 2022
Guiding Visual Question Answering with Attention Priors
T. Le
Vuong Le
Sunil R. Gupta
Svetha Venkatesh
T. Tran
38
6
0
25 May 2022
Less Learn Shortcut: Analyzing and Mitigating Learning of Spurious Feature-Label Correlation
Yanrui Du
Jing Yang
Yan Chen
Jing Liu
Sendong Zhao
Qiaoqiao She
Huaqin Wu
Haifeng Wang
Bing Qin
69
10
0
25 May 2022
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization
Aishwarya Agrawal
Ivana Kajić
Emanuele Bugliarello
Elnaz Davoodi
Anita Gergely
Phil Blunsom
Aida Nematzadeh
OOD
45
17
0
24 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
36
215
0
24 May 2022
Previous
1
2
3
...
25
26
27
...
38
39
40
Next