ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1612.00837
  4. Cited By
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

2 December 2016
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering"

50 / 1,968 papers shown
Title
Synthetic Data for Deep Learning
Synthetic Data for Deep Learning
Sergey I. Nikolenko
46
348
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable
  Visual Question Answering
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley
Mohan Sridharan
NAI
36
0
0
23 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and
  Knowledge-routed Network
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
33
13
0
23 Sep 2019
On Controlled DeEntanglement for Natural Language Processing
On Controlled DeEntanglement for Natural Language Processing
Sai Krishna Rallabandi
19
0
0
22 Sep 2019
Learning Sparse Mixture of Experts for Visual Question Answering
Learning Sparse Mixture of Experts for Visual Question Answering
Vardaan Pahuja
Jie Fu
C. Pal
21
2
0
19 Sep 2019
Pose-aware Multi-level Feature Network for Human Object Interaction
  Detection
Pose-aware Multi-level Feature Network for Human Object Interaction Detection
Bo Wan
Desen Zhou
Yongfei Liu
Rongjie Li
Xuming He
26
197
0
18 Sep 2019
Grounding learning of modifier dynamics: An application to color naming
Grounding learning of modifier dynamics: An application to color naming
Xudong Han
P. Schulz
Trevor Cohn
6
5
0
17 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
24
1
0
17 Sep 2019
Probabilistic framework for solving Visual Dialog
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
30
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
27
65
0
10 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known
  Dataset Biases
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
25
460
0
09 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
32
40
0
08 Sep 2019
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic
  Labels Improve Image Captioning and Visual Question Answering
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Soravit Changpinyo
Bo Pang
Piyush Sharma
Radu Soricut
ObjD
17
20
0
04 Sep 2019
PlotQA: Reasoning over Scientific Plots
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
43
2
0
03 Sep 2019
Out the Window: A Crowd-Sourced Dataset for Activity Classification in
  Security Video
Out the Window: A Crowd-Sourced Dataset for Activity Classification in Security Video
Greg Castañón
N. Shnidman
Tim Anderson
J. Byrne
19
1
0
28 Aug 2019
Language Tasks and Language Games: On Methodology in Current Natural
  Language Processing Research
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
David Schlangen
27
18
0
28 Aug 2019
Neural Text Summarization: A Critical Evaluation
Neural Text Summarization: A Critical Evaluation
Wojciech Kry'sciñski
N. Keskar
Bryan McCann
Caiming Xiong
R. Socher
22
361
0
23 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
76
1,650
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
A. Schwing
Derek Hoiem
15
24
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
93
2,456
0
20 Aug 2019
Message Passing for Complex Question Answering over Knowledge Graphs
Message Passing for Complex Question Answering over Knowledge Graphs
Svitlana Vakulenko
J. D. F. Garcia
A. Polleres
Maarten de Rijke
Michael Cochez
8
72
0
19 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
27
27
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
27
76
0
17 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
17
173
0
14 Aug 2019
Why Does a Visual Question Have Different Answers?
Why Does a Visual Question Have Different Answers?
Nilavra Bhattacharya
Qing Li
Danna Gurari
31
65
0
12 Aug 2019
Multimodal Unified Attention Networks for Vision-and-Language
  Interactions
Multimodal Unified Attention Networks for Vision-and-Language Interactions
Zhou Yu
Yuhao Cui
Jun Yu
Dacheng Tao
Q. Tian
27
38
0
12 Aug 2019
Multi-modality Latent Interaction Network for Visual Question Answering
Multi-modality Latent Interaction Network for Visual Question Answering
Peng Gao
Haoxuan You
Zhanpeng Zhang
Xiaogang Wang
Hongsheng Li
25
82
0
10 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
82
1,919
0
09 Aug 2019
Question-Agnostic Attention for Visual Question Answering
Question-Agnostic Attention for Visual Question Answering
M. Farazi
Salman H Khan
Nick Barnes
13
10
0
09 Aug 2019
CRIC: A VQA Dataset for Compositional Reasoning on Vision and
  Commonsense
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense
Difei Gao
Ruiping Wang
Shiguang Shan
Xilin Chen
CoGe
LRM
20
27
0
08 Aug 2019
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial
  Relation Recognition
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Kaiyu Yang
Olga Russakovsky
Jia Deng
3DPC
26
60
0
07 Aug 2019
Finding Moments in Video Collections Using Natural Language
Finding Moments in Video Collections Using Natural Language
Victor Escorcia
Mattia Soldan
Josef Sivic
Guohao Li
Bryan C. Russell
31
6
0
30 Jul 2019
An Empirical Study on Leveraging Scene Graphs for Visual Question
  Answering
An Empirical Study on Leveraging Scene Graphs for Visual Question Answering
Cheng Zhang
Wei-Lun Chao
D. Xuan
23
50
0
28 Jul 2019
Bilinear Graph Networks for Visual Question Answering
Bilinear Graph Networks for Visual Question Answering
Dalu Guo
Chang Xu
Dacheng Tao
GNN
27
50
0
23 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
132
0
22 Jul 2019
OmniNet: A unified architecture for multi-modal multi-task learning
OmniNet: A unified architecture for multi-modal multi-task learning
Subhojeet Pramanik
Priyanka Agrawal
A. Hussain
27
41
0
17 Jul 2019
2nd Place Solution to the GQA Challenge 2019
2nd Place Solution to the GQA Challenge 2019
Shijie Geng
Ji Zhang
Hang Zhang
Ahmed Elgammal
Dimitris N. Metaxas
ReLM
16
5
0
16 Jul 2019
Don't Take the Premise for Granted: Mitigating Artifacts in Natural
  Language Inference
Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
Yonatan Belinkov
Adam Poliak
Stuart M. Shieber
Benjamin Van Durme
Alexander M. Rush
27
94
0
09 Jul 2019
Learning by Abstraction: The Neural State Machine
Learning by Abstraction: The Neural State Machine
Drew A. Hudson
Christopher D. Manning
NAI
OCL
16
258
0
09 Jul 2019
Embodied Vision-and-Language Navigation with Dynamic Convolutional
  Filters
Embodied Vision-and-Language Navigation with Dynamic Convolutional Filters
Federico Landi
Lorenzo Baraldi
M. Corsini
Rita Cucchiara
LM&Ro
28
26
0
05 Jul 2019
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue
  Systems
Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
19
111
0
02 Jul 2019
ICDAR 2019 Competition on Scene Text Visual Question Answering
ICDAR 2019 Competition on Scene Text Visual Question Answering
Ali Furkan Biten
Rubèn Pérez Tito
Andrés Mafla
Lluís Gómez
Marçal Rusiñol
Minesh Mathew
C. V. Jawahar
Ernest Valveny
Dimosthenis Karatzas
8
75
0
30 Jun 2019
Deep Modular Co-Attention Networks for Visual Question Answering
Deep Modular Co-Attention Networks for Visual Question Answering
Zhou Yu
Jun Yu
Yuhao Cui
Dacheng Tao
Q. Tian
36
797
0
25 Jun 2019
RUBi: Reducing Unimodal Biases in Visual Question Answering
RUBi: Reducing Unimodal Biases in Visual Question Answering
Rémi Cadène
Corentin Dancette
H. Ben-younes
Matthieu Cord
Devi Parikh
CML
19
369
0
24 Jun 2019
Investigating Biases in Textual Entailment Datasets
Investigating Biases in Textual Entailment Datasets
Shawn Tan
Songlin Yang
Chin-Wei Huang
Aaron Courville
25
8
0
23 Jun 2019
Adversarial Regularization for Visual Question Answering: Strengths,
  Shortcomings, and Side Effects
Adversarial Regularization for Visual Question Answering: Strengths, Shortcomings, and Side Effects
Gabriel Grand
Yonatan Belinkov
19
68
0
20 Jun 2019
Improving Visual Question Answering by Referring to Generated Paragraph
  Captions
Improving Visual Question Answering by Referring to Generated Paragraph Captions
Hyounghun Kim
Joey Tianyi Zhou
CoGe
19
20
0
14 Jun 2019
Mimic and Fool: A Task Agnostic Adversarial Attack
Mimic and Fool: A Task Agnostic Adversarial Attack
Akshay Chaturvedi
Utpal Garain
AAML
11
26
0
11 Jun 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
24
439
0
06 Jun 2019
Previous
123...353637383940
Next