ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,037 papers shown
Title
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework
  of Vision-and-Language BERTs
Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs
Emanuele Bugliarello
Ryan Cotterell
Naoaki Okazaki
Desmond Elliott
102
120
0
30 Nov 2020
Point and Ask: Incorporating Pointing into Visual Question Answering
Point and Ask: Incorporating Pointing into Visual Question Answering
Arjun Mani
Nobline Yoo
William Fu-Hinthorn
Olga Russakovsky
3DPC
82
38
0
27 Nov 2020
Learning from Lexical Perturbations for Consistent Visual Question
  Answering
Learning from Lexical Perturbations for Consistent Visual Question Answering
Spencer Whitehead
Hui Wu
Yi R. Fung
Heng Ji
Rogerio Feris
Kate Saenko
75
11
0
26 Nov 2020
Transformation Driven Visual Reasoning
Transformation Driven Visual Reasoning
Xin Hong
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
85
23
0
26 Nov 2020
Multimodal Learning for Hateful Memes Detection
Multimodal Learning for Hateful Memes Detection
Yi Zhou
Zhenhao Chen
102
61
0
25 Nov 2020
A Review of Uncertainty Quantification in Deep Learning: Techniques,
  Applications and Challenges
A Review of Uncertainty Quantification in Deep Learning: Techniques, Applications and Challenges
Moloud Abdar
Farhad Pourpanah
Sadiq Hussain
Dana Rezazadegan
Li Liu
...
Xiaochun Cao
Abbas Khosravi
U. Acharya
V. Makarenkov
S. Nahavandi
BDLUQCV
382
1,952
0
12 Nov 2020
CapWAP: Captioning with a Purpose
CapWAP: Captioning with a Purpose
Adam Fisch
Kenton Lee
Ming-Wei Chang
J. Clark
Regina Barzilay
53
11
0
09 Nov 2020
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
86
62
0
07 Nov 2020
Learning to Respond with Your Favorite Stickers: A Framework of Unifying
  Multi-Modality and User Preference in Multi-Turn Dialog
Learning to Respond with Your Favorite Stickers: A Framework of Unifying Multi-Modality and User Preference in Multi-Turn Dialog
Shen Gao
Preslav Nakov
Li Liu
Dongyan Zhao
Rui Yan
81
15
0
05 Nov 2020
An Improved Attention for Visual Question Answering
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
55
45
0
04 Nov 2020
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
  Class-imbalance View
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Q. Tian
Min Zhang
116
70
0
30 Oct 2020
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis
Stanislav Frolov
Shailza Jolly
Jörn Hees
Andreas Dengel
EGVM
50
5
0
28 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
97
57
0
27 Oct 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question
  Answering
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering
Zanxia Jin
Heran Wu
Chun Yang
Fang Zhou
Jingyan Qin
Lei Xiao
Xu-Cheng Yin
90
31
0
24 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual
  Questions
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions
Radhika Dua
Sai Srinivas Kancheti
V. Balasubramanian
LRM
88
22
0
24 Oct 2020
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSLVLM
101
12
0
24 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing
  Functional Entropies
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies
Itai Gat
Idan Schwartz
Alex Schwing
Tamir Hazan
106
92
0
21 Oct 2020
Bayesian Attention Modules
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
183
62
0
20 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded
  Dialogues
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
117
31
0
20 Oct 2020
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved
  Consistency
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency
Sameer Dharur
Purva Tendulkar
Dhruv Batra
Devi Parikh
Ramprasaath R. Selvaraju
63
2
0
20 Oct 2020
Word Shape Matters: Robust Machine Translation with Visual Embedding
Word Shape Matters: Robust Machine Translation with Visual Embedding
Haohan Wang
Peiyan Zhang
Eric Xing
203
13
0
20 Oct 2020
Answer-checking in Context: A Multi-modal FullyAttention Network for
  Visual Question Answering
Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering
Hantao Huang
Tao Han
Wei Han
D. Yap
Cheng-Ming Chiang
33
4
0
17 Oct 2020
What is More Likely to Happen Next? Video-and-Language Future Event
  Prediction
What is More Likely to Happen Next? Video-and-Language Future Event Prediction
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
104
73
0
15 Oct 2020
Natural Language Rationales with Full-Stack Visual Reasoning: From
  Pixels to Semantic Frames to Commonsense Graphs
Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs
Ana Marasović
Chandra Bhagavatula
J. S. Park
Ronan Le Bras
Noah A. Smith
Yejin Choi
ReLMLRM
101
62
0
15 Oct 2020
Geometry matters: Exploring language examples at the decision boundary
Geometry matters: Exploring language examples at the decision boundary
Debajyoti Datta
Shashwat Kumar
Laura E. Barnes
Tom Fletcher
AAML
47
3
0
14 Oct 2020
Does my multimodal model learn cross-modal interactions? It's harder to
  tell than you might think!
Does my multimodal model learn cross-modal interactions? It's harder to tell than you might think!
Jack Hessel
Lillian Lee
113
76
0
13 Oct 2020
CAPT: Contrastive Pre-Training for Learning Denoised Sequence
  Representations
CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations
Fuli Luo
Pengcheng Yang
Shicheng Li
Xuancheng Ren
Xu Sun
VLMSSL
73
16
0
13 Oct 2020
Contrast and Classify: Training Robust VQA Models
Contrast and Classify: Training Robust VQA Models
Yash Kant
A. Moudgil
Dhruv Batra
Devi Parikh
Harsh Agrawal
55
5
0
13 Oct 2020
VMSMO: Learning to Generate Multimodal Summary for Video-based News
  Articles
VMSMO: Learning to Generate Multimodal Summary for Video-based News Articles
Li Mingzhe
Preslav Nakov
Shen Gao
Zhangming Chan
Dongyan Zhao
Rui Yan
110
84
0
12 Oct 2020
An Empirical Study on Model-agnostic Debiasing Strategies for Robust
  Natural Language Inference
An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference
Tianyu Liu
Xin Zheng
Xiaoan Ding
Baobao Chang
Zhifang Sui
73
25
0
08 Oct 2020
Vision Skills Needed to Answer Visual Questions
Vision Skills Needed to Answer Visual Questions
Xiaoyu Zeng
Yanan Wang
Tai-Yin Chiu
Nilavra Bhattacharya
Danna Gurari
66
18
0
07 Oct 2020
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial
  Expressions
A Linguistic Analysis of Visually Grounded Dialogues Based on Spatial Expressions
Takuma Udagawa
T. Yamazaki
Akiko Aizawa
62
11
0
07 Oct 2020
Pathological Visual Question Answering
Pathological Visual Question Answering
Xuehai He
Zhuo Cai
Wenlan Wei
Yichen Zhang
Luntian Mou
Eric Xing
P. Xie
140
24
0
06 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question
  Answering
Attention Guided Semantic Relationship Parsing for Visual Question Answering
M. Farazi
Salman Khan
Nick Barnes
48
2
0
05 Oct 2020
Attention that does not Explain Away
Attention that does not Explain Away
Nan Ding
Xinjie Fan
Zhenzhong Lan
Dale Schuurmans
Radu Soricut
54
3
0
29 Sep 2020
Where is the Model Looking At?--Concentrate and Explain the Network
  Attention
Where is the Model Looking At?--Concentrate and Explain the Network Attention
Wenjia Xu
Jiuniu Wang
Yang Wang
Guangluan Xu
Wei Dai
Yirong Wu
XAI
90
17
0
29 Sep 2020
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal
  Transformers
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
Jaemin Cho
Jiasen Lu
Dustin Schwenk
Hannaneh Hajishirzi
Aniruddha Kembhavi
VLMMLLM
95
102
0
23 Sep 2020
Multiple interaction learning with question-type prior knowledge for
  constraining answer search space in visual question answering
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering
Tuong Khanh Long Do
Binh X. Nguyen
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
Thanh-Toan Do
42
2
0
23 Sep 2020
Regularizing Attention Networks for Anomaly Detection in Visual Question
  Answering
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering
Doyup Lee
Yeongjae Cheon
Wook-Shin Han
AAMLOOD
47
16
0
21 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues
A Multimodal Memes Classification: A Survey and Open Research Issues
Tariq Habib Afridi
A. Alam
Muhammad Numan Khan
Jawad Khan
Young-Koo Lee
60
41
0
17 Sep 2020
Self-supervised pre-training and contrastive representation learning for
  multiple-choice video QA
Self-supervised pre-training and contrastive representation learning for multiple-choice video QA
Seonhoon Kim
Seohyeong Jeong
Eunbyul Kim
Inho Kang
Nojun Kwak
SSL
123
40
0
17 Sep 2020
Multi-Task Learning with Deep Neural Networks: A Survey
Multi-Task Learning with Deep Neural Networks: A Survey
M. Crawshaw
CVBM
234
630
0
10 Sep 2020
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval
Mayu Otani
Yuta Nakashima
Esa Rahtu
J. Heikkilä
149
76
0
01 Sep 2020
A Dataset and Baselines for Visual Question Answering on Art
A Dataset and Baselines for Visual Question Answering on Art
Noa Garcia
Chentao Ye
Zihua Liu
Qingtao Hu
Mayu Otani
Chenhui Chu
Yuta Nakashima
Teruko Mitamura
CoGe
57
56
0
28 Aug 2020
Visual Question Answering on Image Sets
Visual Question Answering on Image Sets
Ankan Bansal
Yuting Zhang
Rama Chellappa
CoGe
158
44
0
27 Aug 2020
Linguistically-aware Attention for Reducing the Semantic-Gap in
  Vision-Language Tasks
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks
K. Gouthaman
Athira M. Nambiar
K. Srinivas
Anurag Mittal
VLM
63
13
0
18 Aug 2020
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
  Interaction Detection
ConsNet: Learning Consistency Graph for Zero-Shot Human-Object Interaction Detection
Ye Liu
Junsong Yuan
Chang Wen Chen
269
83
0
14 Aug 2020
Location-aware Graph Convolutional Networks for Video Question Answering
Location-aware Graph Convolutional Networks for Video Question Answering
Deng Huang
Peihao Chen
Runhao Zeng
Qing Du
Mingkui Tan
Chuang Gan
GNNBDL
111
175
0
07 Aug 2020
Learning Visual Representations with Caption Annotations
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLMSSL
136
162
0
04 Aug 2020
AiR: Attention with Reasoning Capability
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
56
36
0
28 Jul 2020
Previous
123...333435...394041
Next