ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,037 papers shown
Title
Question-Agnostic Attention for Visual Question Answering
Question-Agnostic Attention for Visual Question Answering
M. Farazi
Salman H Khan
Nick Barnes
29
10
0
09 Aug 2019
CRIC: A VQA Dataset for Compositional Reasoning on Vision and
  Commonsense
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
CoGeLRM
129
28
0
08 Aug 2019
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial
  Relation Recognition
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Kaiyu Yang
Olga Russakovsky
Jia Deng
3DPC
97
63
0
07 Aug 2019
Finding Moments in Video Collections Using Natural Language
Finding Moments in Video Collections Using Natural Language
Victor Escorcia
Mattia Soldan
Josef Sivic
Guohao Li
Bryan C. Russell
62
7
0
30 Jul 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
77
51
0
28 Jul 2019
Bilinear Graph Networks for Visual Question Answering
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
93
54
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
141
136
0
22 Jul 2019
OmniNet: A unified architecture for multi-modal multi-task learning
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
77
41
0
17 Jul 2019
2nd Place Solution to the GQA Challenge 2019
2nd Place Solution to the GQA Challenge 2019
Shijie Geng
Ji Zhang
Hang Zhang
Ahmed Elgammal
Dimitris N. Metaxas
ReLM
41
5
0
16 Jul 2019
Don't Take the Premise for Granted: Mitigating Artifacts in Natural
  Language Inference
Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
Yonatan Belinkov
Adam Poliak
Stuart M. Shieber
Benjamin Van Durme
Alexander M. Rush
99
95
0
09 Jul 2019
Learning by Abstraction: The Neural State Machine
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAIOCL
152
262
0
09 Jul 2019
Embodied Vision-and-Language Navigation with Dynamic Convolutional
  Filters
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi
Lorenzo Baraldi
M. Corsini
Rita Cucchiara
LM&Ro
98
27
0
05 Jul 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue
  Systems
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
70
112
0
02 Jul 2019
ICDAR 2019 Competition on Scene Text Visual Question Answering
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
76
76
0
30 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
138
811
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
104
374
0
24 Jun 2019
Investigating Biases in Textual Entailment Datasets
Investigating Biases in Textual Entailment Datasets
Shawn Tan
Songlin Yang
Chin-Wei Huang
Aaron Courville
61
8
0
23 Jun 2019
Adversarial Regularization for Visual Question Answering: Strengths,
  Shortcomings, and Side Effects
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
Gabriel Grand
Yonatan Belinkov
114
68
0
20 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph
  Captions
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim
Joey Tianyi Zhou
CoGe
50
20
0
14 Jun 2019
Mimic and Fool: A Task Agnostic Adversarial Attack
Mimic and Fool: A Task Agnostic Adversarial Attack
Akshay Chaturvedi
Utpal Garain
AAML
57
27
0
11 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
154
478
0
06 Jun 2019
Generating Question Relevant Captions to Aid Visual Question Answering
Generating Question Relevant Captions to Aid Visual Question Answering
Jialin Wu
Zeyuan Hu
Raymond J. Mooney
123
43
0
03 Jun 2019
OK-VQA: A Visual Question Answering Benchmark Requiring External
  Knowledge
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Kenneth Marino
Mohammad Rastegari
Ali Farhadi
Roozbeh Mottaghi
211
1,095
0
31 May 2019
Scene Text Visual Question Answering
Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Ernest Valveny
C. V. Jawahar
Dimosthenis Karatzas
169
361
0
31 May 2019
What Makes Training Multi-Modal Classification Networks Hard?
What Makes Training Multi-Modal Classification Networks Hard?
Weiyao Wang
Du Tran
Matt Feiszli
185
453
0
29 May 2019
Learning Dynamics of Attention: Human Prior for Interpretable Machine
  Reasoning
Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning
Wonjae Kim
Yoonho Lee
33
6
0
28 May 2019
Structure Learning for Neural Module Networks
Structure Learning for Neural Module Networks
Vardaan Pahuja
Jie Fu
Sarath Chandar
C. Pal
69
7
0
27 May 2019
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Deep Reason: A Strong Baseline for Real-World Visual Reasoning
Chenfei Wu
Yanzhao Zhou
Gen Li
Nan Duan
Duyu Tang
Xiaojie Wang
LRMNAIReLM
27
2
0
24 May 2019
Self-Critical Reasoning for Robust Visual Question Answering
Self-Critical Reasoning for Robust Visual Question Answering
Jialin Wu
Raymond J. Mooney
OODNAI
77
161
0
24 May 2019
AttentionRNN: A Structured Spatial Attention Mechanism
AttentionRNN: A Structured Spatial Attention Mechanism
Siddhesh Khandelwal
Leonid Sigal
69
3
0
22 May 2019
Multimodal Transformer with Multi-View Visual Representation for Image
  Captioning
Multimodal Transformer with Multi-View Visual Representation for Image Captioning
Jun-chen Yu
Jing Li
Zhou Yu
Qingming Huang
ViT
70
387
0
20 May 2019
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
SplitNet: Sim2Sim and Task2Task Transfer for Embodied Visual Navigation
Daniel Gordon
Abhishek Kadian
Devi Parikh
Judy Hoffman
Dhruv Batra
90
75
0
18 May 2019
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image
  Representations
Aligning Visual Regions and Textual Concepts for Semantic-Grounded Image Representations
Fenglin Liu
Yuanxin Liu
Xuancheng Ren
Xiaodong He
Xu Sun
VLM
80
82
0
15 May 2019
Misleading Failures of Partial-input Baselines
Misleading Failures of Partial-input Baselines
Shi Feng
Eric Wallace
Jordan L. Boyd-Graber
53
0
0
14 May 2019
Quantifying and Alleviating the Language Prior Problem in Visual
  Question Answering
Quantifying and Alleviating the Language Prior Problem in Visual Question Answering
Yangyang Guo
Zhiyong Cheng
Liqiang Nie
Yebin Liu
Yinglong Wang
Mohan Kankanhalli
57
37
0
13 May 2019
Language-Conditioned Graph Networks for Relational Reasoning
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
87
175
0
10 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
83
230
0
25 Apr 2019
Challenges and Prospects in Vision and Language Research
Challenges and Prospects in Vision and Language Research
Kushal Kafle
Robik Shrestha
Christopher Kanan
76
41
0
19 Apr 2019
Integrating Text and Image: Determining Multimodal Document Intent in
  Instagram Posts
Integrating Text and Image: Determining Multimodal Document Intent in Instagram Posts
Julia Kruk
Jonah Lubin
Karan Sikka
Xiaoyu Lin
Dan Jurafsky
Ajay Divakaran
154
96
0
19 Apr 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
232
1,258
0
18 Apr 2019
Learning to Collocate Neural Modules for Image Captioning
Learning to Collocate Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Jianfei Cai
71
78
0
18 Apr 2019
A Simple Baseline for Audio-Visual Scene-Aware Dialog
A Simple Baseline for Audio-Visual Scene-Aware Dialog
Idan Schwartz
Alex Schwing
Tamir Hazan
89
71
0
11 Apr 2019
Reasoning Visual Dialogs with Structural and Partial Observations
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng
Wenguan Wang
Siyuan Qi
Song-Chun Zhu
128
117
0
11 Apr 2019
Quizbowl: The Case for Incremental Question Answering
Quizbowl: The Case for Incremental Question Answering
Pedro Rodriguez
Shi Feng
Mohit Iyyer
He He
Jordan L. Boyd-Graber
81
50
0
09 Apr 2019
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Revisiting EmbodiedQA: A Simple Baseline and Beyond
Yuehua Wu
Lu Jiang
Yi Yang
LM&Ro
86
30
0
08 Apr 2019
Actively Seeking and Learning from Live Data
Actively Seeking and Learning from Live Data
Damien Teney
Anton Van Den Hengel
OOD
80
21
0
05 Apr 2019
VQD: Visual Query Detection in Natural Scenes
VQD: Visual Query Detection in Natural Scenes
Manoj Acharya
Karan Jariwala
Christopher Kanan
ObjD
60
18
0
04 Apr 2019
Context and Attribute Grounded Dense Captioning
Context and Attribute Grounded Dense Captioning
Guojun Yin
Lu Sheng
Bin Liu
Nenghai Yu
Xiaogang Wang
Jing Shao
71
76
0
02 Apr 2019
Relation-Aware Graph Attention Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering
Linjie Li
Zhe Gan
Yu Cheng
Jingjing Liu
GNN
198
347
0
29 Mar 2019
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Visual Query Answering by Entity-Attribute Graph Matching and Reasoning
Peixi Xiong
Huayi Zhan
Xin Eric Wang
Baivab Sinha
Ying Nian Wu
49
16
0
16 Mar 2019
Previous
123...3738394041
Next