ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 1,966 papers shown
Title
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
36
343
0
31 May 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
28
442
0
29 May 2019
Learning Dynamics of Attention: Human Prior for Interpretable Machine
  Reasoning
Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning
Wonjae Kim
Yoonho Lee
16
6
0
28 May 2019
Structure Learning for Neural Module Networks
Structure Learning for Neural Module Networks
Vardaan Pahuja
Jie Fu
Sarath Chandar
C. Pal
13
7
0
27 May 2019
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Chenfei Wu
Yanzhao Zhou
Gen Li
Nan Duan
Duyu Tang
Xiaojie Wang
LRM
NAI
ReLM
11
2
0
24 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OOD
NAI
32
159
0
24 May 2019
AttentionRNN: A Structured Spatial Attention Mechanism
AttentionRNN: A Structured Spatial Attention Mechanism
Siddhesh Khandelwal
Leonid Sigal
24
3
0
22 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
27
377
0
20 May 2019
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
Daniel Gordon
Abhishek Kadian
Devi Parikh
Judy Hoffman
Dhruv Batra
20
76
0
18 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
34
81
0
15 May 2019
Misleading Failures of Partial-input Baselines
Misleading Failures of Partial-input Baselines
Shi Feng
Eric Wallace
Jordan L. Boyd-Graber
33
0
0
14 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual
  Question Answering
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Y. Liu
Yinglong Wang
Mohan Kankanhalli
22
36
0
13 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
31
171
0
10 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
31
227
0
25 Apr 2019
Challenges and Prospects in Vision and Language Research
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
22
41
0
19 Apr 2019
Integrating Text and Image: Determining Multimodal Document Intent in
  Instagram Posts
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Julia Kruk
Jonah Lubin
Karan Sikka
Xiaoyu Lin
Dan Jurafsky
Ajay Divakaran
29
95
0
19 Apr 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
15
1,128
0
18 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
25
77
0
18 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
A. Schwing
Tamir Hazan
27
69
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
39
117
0
11 Apr 2019
Quizbowl: The Case for Incremental Question Answering
Quizbowl: The Case for Incremental Question Answering
Pedro Rodriguez
Shi Feng
Mohit Iyyer
He He
Jordan L. Boyd-Graber
20
50
0
09 Apr 2019
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Yuehua Wu
Lu Jiang
Yi Yang
LM&Ro
42
30
0
08 Apr 2019
Actively Seeking and Learning from Live Data
Actively Seeking and Learning from Live Data
Damien Teney
Anton Van Den Hengel
OOD
32
21
0
05 Apr 2019
VQD: Visual Query Detection in Natural Scenes
VQD: Visual Query Detection in Natural Scenes
Manoj Acharya
Karan Jariwala
Christopher Kanan
ObjD
24
18
0
04 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
16
75
0
02 Apr 2019
Relation-Aware Graph Attention Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
54
341
0
29 Mar 2019
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Peixi Xiong
Huayi Zhan
Xin Wang
Baivab Sinha
Ying Nian Wu
18
16
0
16 Mar 2019
Answer Them All! Toward Universal Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models
Robik Shrestha
Kushal Kafle
Christopher Kanan
22
82
0
01 Mar 2019
GQA: A New Dataset for Real-World Visual Reasoning and Compositional
  Question Answering
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Drew A. Hudson
Christopher D. Manning
CoGe
NAI
19
137
0
25 Feb 2019
MUREL: Multimodal Relational Reasoning for Visual Question Answering
MUREL: Multimodal Relational Reasoning for Visual Question Answering
Rémi Cadène
H. Ben-younes
Matthieu Cord
Nicolas Thome
LRM
19
271
0
25 Feb 2019
Cycle-Consistency for Robust Visual Question Answering
Cycle-Consistency for Robust Visual Question Answering
Meet Shah
Xinlei Chen
Marcus Rohrbach
Devi Parikh
OOD
25
185
0
15 Feb 2019
Can We Automate Diagrammatic Reasoning?
Can We Automate Diagrammatic Reasoning?
Sk. Arif Ahmed
D. P. Dogra
S. Kar
P. Roy
D. Prasad
8
4
0
13 Feb 2019
Taking a HINT: Leveraging Explanations to Make Vision and Language
  Models More Grounded
Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded
Ramprasaath R. Selvaraju
Stefan Lee
Yilin Shen
Hongxia Jin
Shalini Ghosh
Larry Heck
Dhruv Batra
Devi Parikh
FAtt
VLM
16
252
0
11 Feb 2019
EvalAI: Towards Better Evaluation Systems for AI Agents
EvalAI: Towards Better Evaluation Systems for AI Agents
Deshraj Yadav
Rishabh Jain
Harsh Agrawal
Prithvijit Chattopadhyay
Taranjeet Singh
Akash Jain
Shivkaran Singh
Stefan Lee
Dhruv Batra
ELM
11
56
0
10 Feb 2019
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of
  Key Ideas and Publications, and Bibliography for Explainable AI
Explanation in Human-AI Systems: A Literature Meta-Review, Synopsis of Key Ideas and Publications, and Bibliography for Explainable AI
Shane T. Mueller
R. Hoffman
W. Clancey
Abigail Emrey
Gary Klein
XAI
18
285
0
05 Feb 2019
VrR-VG: Refocusing Visually-Relevant Relationships
VrR-VG: Refocusing Visually-Relevant Relationships
Yuanzhi Liang
Yalong Bai
Wei Zhang
Xueming Qian
Li Zhu
Tao Mei
3DH
19
8
0
01 Feb 2019
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and
  Visual Relationship Detection
BLOCK: Bilinear Superdiagonal Fusion for Visual Question Answering and Visual Relationship Detection
H. Ben-younes
Rémi Cadène
Nicolas Thome
Matthieu Cord
16
218
0
31 Jan 2019
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Visual Entailment: A Novel Task for Fine-Grained Image Understanding
Ning Xie
Farley Lai
Derek Doran
Asim Kadav
CoGe
51
322
0
20 Jan 2019
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Evaluating Text-to-Image Matching using Binary Image Selection (BISON)
Hexiang Hu
Ishan Misra
L. V. D. van der Maaten
24
22
0
19 Jan 2019
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et
  al., 2018)
Response to "Visual Dialogue without Vision or Dialogue" (Massiceti et al., 2018)
Abhishek Das
Devi Parikh
Dhruv Batra
17
2
0
16 Jan 2019
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions
Runtao Liu
Chenxi Liu
Yutong Bai
Alan Yuille
NAI
ObjD
22
122
0
03 Jan 2019
The meaning of "most" for visual question answering models
The meaning of "most" for visual question answering models
A. Kuhnle
Ann A. Copestake
8
4
0
31 Dec 2018
Scene Graph Reasoning with Prior Visual Relationship for Visual Question
  Answering
Scene Graph Reasoning with Prior Visual Relationship for Visual Question Answering
Zhuoqian Yang
Zengchang Qin
Jing Yu
Yue Hu
GNN
25
16
0
23 Dec 2018
From FiLM to Video: Multi-turn Question Answering with Multi-modal
  Context
From FiLM to Video: Multi-turn Question Answering with Multi-modal Context
T. Nguyen
Shikhar Sharma
Hannes Schulz
Layla El Asri
15
33
0
17 Dec 2018
Visual Social Relationship Recognition
Visual Social Relationship Recognition
Junnan Li
Yongkang Wong
Qi Zhao
Mohan Kankanhalli
35
27
0
13 Dec 2018
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual
  Question Answering
Dynamic Fusion with Intra- and Inter- Modality Attention Flow for Visual Question Answering
Peng Gao
Zhengkai Jiang
Haoxuan You
Pan Lu
Steven C. H. Hoi
Xiaogang Wang
Hongsheng Li
AIMat
24
363
0
13 Dec 2018
Learning Representations of Sets through Optimized Permutations
Learning Representations of Sets through Optimized Permutations
Yan Zhang
Jonathon S. Hare
Adam Prugel-Bennett
SSL
11
24
0
10 Dec 2018
Learning to Compose Dynamic Tree Structures for Visual Contexts
Learning to Compose Dynamic Tree Structures for Visual Contexts
Kaihua Tang
Hanwang Zhang
Baoyuan Wu
Wenhan Luo
Wei Liu
20
491
0
05 Dec 2018
Explainable and Explicit Visual Reasoning over Scene Graphs
Explainable and Explicit Visual Reasoning over Scene Graphs
Jiaxin Shi
Hanwang Zhang
Juan-Zi Li
OCL
169
230
0
05 Dec 2018
Learning to Explain with Complemental Examples
Learning to Explain with Complemental Examples
Atsushi Kanehira
Tatsuya Harada
12
40
0
04 Dec 2018
Previous
123...3637383940
Next