ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,037 papers shown
Title
REXUP: I REason, I EXtract, I UPdate with Structured Compositional
  Reasoning for Visual Question Answering
REXUP: I REason, I EXtract, I UPdate with Structured Compositional Reasoning for Visual Question Answering
Siwen Luo
S. Han
Kaiyuan Sun
Josiah Poon
CoGeLRMReLM
83
4
0
27 Jul 2020
Contrastive Visual-Linguistic Pretraining
Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Su
Zhengkai Jiang
Peng Gao
Zuohui Fu
Gerard de Melo
Sen Su
VLMSSLCLIP
105
29
0
26 Jul 2020
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Dialog without Dialog Data: Learning Visual Dialog Agents from VQA Data
Michael Cogswell
Jiasen Lu
Rishabh Jain
Stefan Lee
Devi Parikh
Dhruv Batra
VLMEgoV
78
15
0
24 Jul 2020
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Fine-Grained Image Captioning with Global-Local Discriminative Objective
Jie Wu
Tianshui Chen
Hefeng Wu
Zhi Yang
Guangchun Luo
Liang Lin
70
59
0
21 Jul 2020
Semantic Equivalent Adversarial Data Augmentation for Visual Question
  Answering
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Ruixue Tang
Chao Ma
W. Zhang
Qi Wu
Xiaokang Yang
OOD
72
49
0
19 Jul 2020
Reducing Language Biases in Visual Question Answering with
  Visually-Grounded Question Encoder
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
98
79
0
13 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
Targeting the Benchmark: On Methodology in Current Natural Language
  Processing Research
Targeting the Benchmark: On Methodology in Current Natural Language Processing Research
David Schlangen
74
58
0
07 Jul 2020
What Gives the Answer Away? Question Answering Bias Analysis on Video QA
  Datasets
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets
Jianing Yang
Yuying Zhu
Yongxin Wang
Ruitao Yi
Amir Zadeh
Louis-Philippe Morency
69
12
0
07 Jul 2020
A Competence-aware Curriculum for Visual Concepts Learning via Question
  Answering
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
Qing Li
Siyuan Huang
Yining Hong
Song-Chun Zhu
119
29
0
03 Jul 2020
The Impact of Explanations on AI Competency Prediction in VQA
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour
Arijit Ray
Xiaoyu Lin
J. Schulze
Yi Yao
Giedrius Burachas
59
9
0
02 Jul 2020
DocVQA: A Dataset for VQA on Document Images
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
180
748
0
01 Jul 2020
Graph Optimal Transport for Cross-Domain Alignment
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen
Zhe Gan
Yu Cheng
Linjie Li
Lawrence Carin
Jingjing Liu
OT
127
152
0
26 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAILRM
121
60
0
20 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
43
3
0
17 Jun 2020
Sparse and Continuous Attention Mechanisms
Sparse and Continuous Attention Mechanisms
André F. T. Martins
António Farinhas
Marcos Vinícius Treviso
Vlad Niculae
P. Aguiar
Mário A. T. Figueiredo
85
42
0
12 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSLVLM
184
437
0
11 Jun 2020
Exploring Weaknesses of VQA Models through Attribution Driven Insights
Exploring Weaknesses of VQA Models through Attribution Driven Insights
Shaunak Halbe
47
2
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation
  Learning
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjDVLM
171
501
0
11 Jun 2020
Estimating semantic structure for the VQA answer space
Estimating semantic structure for the VQA answer space
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
47
4
0
10 Jun 2020
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
OOD
90
90
0
09 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
151
403
0
08 Jun 2020
Structured Multimodal Attentions for TextVQA
Structured Multimodal Attentions for TextVQA
Chenyu Gao
Qi Zhu
Peng Wang
Hui Li
Yuliang Liu
Anton Van Den Hengel
Qi Wu
110
60
0
01 Jun 2020
On the Value of Out-of-Distribution Testing: An Example of Goodhart's
  Law
On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law
Damien Teney
Kushal Kafle
Robik Shrestha
Ehsan Abbasnejad
Christopher Kanan
Anton Van Den Hengel
OODDOOD
112
147
0
19 May 2020
Behind the Scene: Revealing the Secrets of Pre-trained
  Vision-and-Language Models
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
Jize Cao
Zhe Gan
Yu Cheng
Licheng Yu
Yen-Chun Chen
Jingjing Liu
VLM
133
130
0
15 May 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal
  Localization in VideoQA
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
80
31
0
13 May 2020
Machine Reading Comprehension: The Role of Contextualized Language
  Models and Beyond
Machine Reading Comprehension: The Role of Contextualized Language Models and Beyond
Zhuosheng Zhang
Hai Zhao
Rui Wang
115
63
0
13 May 2020
Cross-Modality Relevance for Reasoning on Language and Vision
Cross-Modality Relevance for Reasoning on Language and Vision
Chen Zheng
Quan Guo
Parisa Kordjamshidi
LRM
91
36
0
12 May 2020
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela
Hamed Firooz
Aravind Mohan
Vedanuj Goswami
Amanpreet Singh
Pratik Ringshia
Davide Testuggine
113
613
0
10 May 2020
What is Learned in Visually Grounded Neural Syntax Acquisition
What is Learned in Visually Grounded Neural Syntax Acquisition
Noriyuki Kojima
Hadar Averbuch-Elor
Alexander M. Rush
Yoav Artzi
77
22
0
04 May 2020
Visual Question Answering with Prior Class Semantics
Visual Question Answering with Prior Class Semantics
Violetta Shevchenko
Damien Teney
A. Dick
Anton Van Den Hengel
BDL
63
7
0
04 May 2020
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic
  Similarity Judgments for MS-COCO
Crisscrossed Captions: Extended Intramodal and Intermodal Semantic Similarity Judgments for MS-COCO
Zarana Parekh
Jason Baldridge
Daniel Cer
Austin Waters
Yinfei Yang
78
62
0
30 Apr 2020
Improving Vision-and-Language Navigation with Image-Text Pairs from the
  Web
Improving Vision-and-Language Navigation with Image-Text Pairs from the Web
Arjun Majumdar
Ayush Shrivastava
Stefan Lee
Peter Anderson
Devi Parikh
Dhruv Batra
LM&Ro
210
236
0
30 Apr 2020
STARC: Structured Annotations for Reading Comprehension
STARC: Structured Annotations for Reading Comprehension
Yevgeni Berzak
J. Malmaud
R. Levy
65
24
0
30 Apr 2020
Dynamic Language Binding in Relational Visual Reasoning
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
71
19
0
30 Apr 2020
Look at the First Sentence: Position Bias in Question Answering
Look at the First Sentence: Position Bias in Question Answering
Miyoung Ko
Jinhyuk Lee
Hyunjae Kim
Gangwoo Kim
Jaewoo Kang
FaMLOOD
80
100
0
30 Apr 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAMLXAI
120
382
0
30 Apr 2020
Pragmatic Issue-Sensitive Image Captioning
Pragmatic Issue-Sensitive Image Captioning
Allen Nie
Reuben Cohn-Gordon
Christopher Potts
53
24
0
29 Apr 2020
The Curse of Performance Instability in Analysis Datasets: Consequences,
  Source, and Suggestions
The Curse of Performance Instability in Analysis Datasets: Consequences, Source, and Suggestions
Xiang Zhou
Yixin Nie
Hao Tan
Joey Tianyi Zhou
113
41
0
28 Apr 2020
A Novel Attention-based Aggregation Function to Combine Vision and
  Language
A Novel Attention-based Aggregation Function to Combine Vision and Language
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
9
0
27 Apr 2020
Deep Multimodal Neural Architecture Search
Deep Multimodal Neural Architecture Search
Zhou Yu
Yuhao Cui
Jun-chen Yu
Meng Wang
Dacheng Tao
Qi Tian
77
100
0
25 Apr 2020
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond
Duy-Kien Nguyen
Vedanuj Goswami
Xinlei Chen
71
23
0
24 Apr 2020
Debiasing Skin Lesion Datasets and Models? Not So Fast
Debiasing Skin Lesion Datasets and Models? Not So Fast
Alceu Bissoto
Eduardo Valle
Sandra Avila
102
55
0
23 Apr 2020
Visual Question Answering Using Semantic Information from Image
  Descriptions
Visual Question Answering Using Semantic Information from Image Descriptions
Tasmia Tasrin
Md Sultan al Nahian
Brent Harrison
30
0
0
23 Apr 2020
Learning What Makes a Difference from Counterfactual Examples and
  Gradient Supervision
Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
OODSSLCML
99
119
0
20 Apr 2020
Are we pretraining it right? Digging deeper into visio-linguistic
  pretraining
Are we pretraining it right? Digging deeper into visio-linguistic pretraining
Amanpreet Singh
Vedanuj Goswami
Devi Parikh
VLM
80
48
0
19 Apr 2020
Multiple Visual-Semantic Embedding for Video Retrieval from Query
  Sentence
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Huy Manh Nguyen
Tomo Miyazaki
Yoshihiro Sugaya
S. Omachi
144
1
0
16 Apr 2020
Avoiding the Hypothesis-Only Bias in Natural Language Inference via
  Ensemble Adversarial Training
Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training
Joe Stacey
Pasquale Minervini
Haim Dubossarsky
Sebastian Riedel
Tim Rocktaschel
AI4CE
79
8
0
16 Apr 2020
Shortcut Learning in Deep Neural Networks
Shortcut Learning in Deep Neural Networks
Robert Geirhos
J. Jacobsen
Claudio Michaelis
R. Zemel
Wieland Brendel
Matthias Bethge
Felix Wichmann
237
2,074
0
16 Apr 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
259
1,955
0
13 Apr 2020
Previous
123...343536...394041
Next