ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
v1v2v3 (latest)

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 2,037 papers shown
Title
Multimodal Attention Networks for Low-Level Vision-and-Language
  Navigation
Multimodal Attention Networks for Low-Level Vision-and-Language Navigation
Federico Landi
Lorenzo Baraldi
Marcella Cornia
M. Corsini
Rita Cucchiara
LM&Ro
96
29
0
27 Nov 2019
Learning to Learn Words from Visual Scenes
Learning to Learn Words from Visual Scenes
Dídac Surís
Dave Epstein
Heng Ji
Shih-Fu Chang
Carl Vondrick
VLMCLIPSSLOffRL
72
4
0
25 Nov 2019
Unsupervised Keyword Extraction for Full-sentence VQA
Unsupervised Keyword Extraction for Full-sentence VQA
Kohei Uehara
Tatsuya Harada
32
1
0
23 Nov 2019
Temporal Reasoning via Audio Question Answering
Temporal Reasoning via Audio Question Answering
Haytham M. Fayek
Justin Johnson
65
54
0
21 Nov 2019
Inspect Transfer Learning Architecture with Dilated Convolution
Inspect Transfer Learning Architecture with Dilated Convolution
Syeda Noor Jaha Azim
Md. Aminur Rab Ratul
31
0
0
20 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAMLFAtt
85
26
0
19 Nov 2019
Question-Conditioned Counterfactual Image Generation for VQA
Question-Conditioned Counterfactual Image Generation for VQA
Jingjing Pan
Yash Goyal
Stefan Lee
EgoVOOD
90
19
0
14 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal
  Transformers for TextVQA
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
100
197
0
14 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
86
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion,
  and Applications
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAIAI4TS
126
338
0
10 Nov 2019
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection
  And Evaluation Platform
SIMMC: Situated Interactive Multi-Modal Conversational Data Collection And Evaluation Platform
Paul A. Crook
Shivani Poddar
Ankita De
Semir Shafi
David Whitney
A. Geramifard
R. Subba
48
18
0
07 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRMReLM
105
9
0
31 Oct 2019
Adversarial NLI: A New Benchmark for Natural Language Understanding
Adversarial NLI: A New Benchmark for Natural Language Understanding
Yixin Nie
Adina Williams
Emily Dinan
Joey Tianyi Zhou
Jason Weston
Douwe Kiela
243
1,014
0
31 Oct 2019
Assisting human experts in the interpretation of their visual process: A
  case study on assessing copper surface adhesive potency
Assisting human experts in the interpretation of their visual process: A case study on assessing copper surface adhesive potency
T. Hascoet
Xuejiao Deng
Daniela Mihai
Mari Sugiyama
Yuji Adachi
Sachiko Nakamura
Jonathon S. Hare
Tomoko Hayashi
T. Takiguchi
20
1
0
24 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
152
80
0
23 Oct 2019
Multi-modal Deep Analysis for Multimedia
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
76
43
0
11 Oct 2019
Modulated Self-attention Convolutional Network for VQA
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
29
1
0
08 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
93
71
0
08 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
149
25
0
30 Sep 2019
On Incorporating Semantic Prior Knowledge in Deep Learning Through
  Embedding-Space Constraints
On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
NAI
105
9
0
30 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
62
59
0
26 Sep 2019
Synthetic Data for Deep Learning
Synthetic Data for Deep Learning
Sergey I. Nikolenko
161
358
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
367
948
0
24 Sep 2019
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable
  Visual Question Answering
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley
Mohan Sridharan
NAI
54
0
0
23 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and
  Knowledge-routed Network
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
59
13
0
23 Sep 2019
On Controlled DeEntanglement for Natural Language Processing
On Controlled DeEntanglement for Natural Language Processing
Sai Krishna Rallabandi
60
0
0
22 Sep 2019
Learning Sparse Mixture of Experts for Visual Question Answering
Learning Sparse Mixture of Experts for Visual Question Answering
Vardaan Pahuja
Jie Fu
C. Pal
43
3
0
19 Sep 2019
Pose-aware Multi-level Feature Network for Human Object Interaction
  Detection
Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Bo Wan
Desen Zhou
Yongfei Liu
Rongjie Li
Xuming He
76
200
0
18 Sep 2019
Grounding learning of modifier dynamics: An application to color naming
Grounding learning of modifier dynamics: An application to color naming
Xudong Han
P. Schulz
Trevor Cohn
13
5
0
17 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
39
1
0
17 Sep 2019
Probabilistic framework for solving Visual Dialog
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
145
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
86
65
0
10 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known
  Dataset Biases
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
139
469
0
09 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
85
40
0
08 Sep 2019
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic
  Labels Improve Image Captioning and Visual Question Answering
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Soravit Changpinyo
Bo Pang
Piyush Sharma
Radu Soricut
ObjD
65
20
0
04 Sep 2019
PlotQA: Reasoning over Scientific Plots
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
133
235
0
03 Sep 2019
Out the Window: A Crowd-Sourced Dataset for Activity Classification in
  Security Video
Out the Window: A Crowd-Sourced Dataset for Activity Classification in Security Video
Greg Castañón
N. Shnidman
Tim Anderson
J. Byrne
46
2
0
28 Aug 2019
Language Tasks and Language Games: On Methodology in Current Natural
  Language Processing Research
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
David Schlangen
78
19
0
28 Aug 2019
Neural Text Summarization: A Critical Evaluation
Neural Text Summarization: A Critical Evaluation
Wojciech Kry'sciñski
N. Keskar
Bryan McCann
Caiming Xiong
R. Socher
129
368
0
23 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLMMLLMSSL
343
1,672
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
254
2,499
0
20 Aug 2019
Message Passing for Complex Question Answering over Knowledge Graphs
Message Passing for Complex Question Answering over Knowledge Graphs
Svitlana Vakulenko
J. D. F. Garcia
A. Polleres
Maarten de Rijke
Michael Cochez
91
73
0
19 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
58
27
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAttUQCV
124
76
0
17 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
97
173
0
14 Aug 2019
Why Does a Visual Question Have Different Answers?
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
66
66
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
109
38
0
12 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
69
82
0
10 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
278
1,975
0
09 Aug 2019
Previous
123...363738394041
Next