ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
Kexin Yi
Yuta Saito
Yunzhu Li
Pushmeet Kohli
Jiajun Wu
Antonio Torralba
J. Tenenbaum
NAI
170
475
0
03 Oct 2019
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural
  Networks
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
Mehdi Neshat
Zifan Wang
Bradley Alexander
Fan Yang
Zijian Zhang
Sirui Ding
Markus Wagner
Helen Zhou
FAtt
114
1,078
0
03 Oct 2019
Embodied Language Grounding with 3D Visual Feature Representations
Embodied Language Grounding with 3D Visual Feature Representations
Mihir Prabhudesai
H. Tung
Syed Ashar Javed
Maximilian Sieb
Adam W. Harley
Katerina Fragkiadaki
113
21
0
02 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual
  Multimodal Representations
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
On Incorporating Semantic Prior Knowledge in Deep Learning Through
  Embedding-Space Constraints
On Incorporating Semantic Prior Knowledge in Deep Learning Through Embedding-Space Constraints
Damien Teney
Ehsan Abbasnejad
Anton Van Den Hengel
NAI
105
9
0
30 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
62
59
0
26 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLMOT
134
449
0
25 Sep 2019
Synthetic Data for Deep Learning
Synthetic Data for Deep Learning
Sergey I. Nikolenko
149
358
0
25 Sep 2019
Question Answering is a Format; When is it Useful?
Question Answering is a Format; When is it Useful?
Matt Gardner
Jonathan Berant
Hannaneh Hajishirzi
Alon Talmor
Sewon Min
63
52
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
365
948
0
24 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and
  Knowledge-routed Network
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
57
13
0
23 Sep 2019
Look, Read and Enrich. Learning from Scientific Figures and their
  Captions
Look, Read and Enrich. Learning from Scientific Figures and their Captions
José Manuél Gómez-Pérez
Raúl Ortega
34
11
0
19 Sep 2019
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual
  Signals
Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
67
33
0
18 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
35
1
0
17 Sep 2019
Scene Graph Parsing by Attention Graph
Scene Graph Parsing by Attention Graph
Martin Andrews
Yew Ken Chia
Sam Witteveen
GNN
48
12
0
13 Sep 2019
Probabilistic framework for solving Visual Dialog
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
141
13
0
11 Sep 2019
Reasoning About Human-Object Interactions Through Dual Attention
  Networks
Reasoning About Human-Object Interactions Through Dual Attention Networks
Tete Xiao
Quanfu Fan
Dan Gutfreund
Mathew Monfort
A. Oliva
Bolei Zhou
56
35
0
10 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through
  Entailed Question Generation
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
83
65
0
10 Sep 2019
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known
  Dataset Biases
Don't Take the Easy Way Out: Ensemble Based Methods for Avoiding Known Dataset Biases
Christopher Clark
Mark Yatskar
Luke Zettlemoyer
OOD
125
468
0
09 Sep 2019
MULE: Multimodal Universal Language Embedding
MULE: Multimodal Universal Language Embedding
Donghyun Kim
Kuniaki Saito
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
82
40
0
08 Sep 2019
Conditional Text Generation for Harmonious Human-Machine Interaction
Conditional Text Generation for Harmonious Human-Machine Interaction
Bin Guo
Hao Wang
Yasan Ding
Wei Wu
Shaoyang Hao
Yueqi Sun
Zhiwen Yu
103
4
0
08 Sep 2019
Abductive Reasoning as Self-Supervision for Common Sense Question
  Answering
Abductive Reasoning as Self-Supervision for Common Sense Question Answering
Sathyanarayanan N. Aakur
Sudeep Sarkar
LRMSSLOOD
58
4
0
06 Sep 2019
Supervised Multimodal Bitransformers for Classifying Images and Text
Supervised Multimodal Bitransformers for Classifying Images and Text
Douwe Kiela
Suvrat Bhooshan
Hamed Firooz
Ethan Perez
Davide Testuggine
156
248
0
06 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question
  Answering
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
57
33
0
05 Sep 2019
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic
  Labels Improve Image Captioning and Visual Question Answering
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering
Soravit Changpinyo
Bo Pang
Piyush Sharma
Radu Soricut
ObjD
60
20
0
04 Sep 2019
Do Cross Modal Systems Leverage Semantic Relationships?
Do Cross Modal Systems Leverage Semantic Relationships?
Shah Nawaz
Muhammad Kamran Janjua
I. Gallo
Arif Mahmood
Alessandro Calefati
Faisal Shafait
56
8
0
03 Sep 2019
PlotQA: Reasoning over Scientific Plots
PlotQA: Reasoning over Scientific Plots
Nitesh Methani
Pritha Ganguly
Mitesh M. Khapra
Pratyush Kumar
131
235
0
03 Sep 2019
What You See is What You Get: Visual Pronoun Coreference Resolution in
  Dialogues
What You See is What You Get: Visual Pronoun Coreference Resolution in Dialogues
Xintong Yu
Hongming Zhang
Yangqiu Song
Yan Song
Changshui Zhang
42
28
0
01 Sep 2019
Phrase Grounding by Soft-Label Chain Conditional Random Field
Phrase Grounding by Soft-Label Chain Conditional Random Field
Jiacheng Liu
Julia Hockenmaier
45
10
0
01 Sep 2019
Generating Personalized Recipes from Historical User Preferences
Generating Personalized Recipes from Historical User Preferences
Bodhisattwa Prasad Majumder
Shuyang Li
Jianmo Ni
Julian McAuley
76
116
0
31 Aug 2019
Aesthetic Image Captioning From Weakly-Labelled Photographs
Aesthetic Image Captioning From Weakly-Labelled Photographs
Koustav Ghosal
A. Rana
A. Smolic
67
25
0
29 Aug 2019
Grounded Agreement Games: Emphasizing Conversational Grounding in Visual
  Dialogue Settings
Grounded Agreement Games: Emphasizing Conversational Grounding in Visual Dialogue Settings
David Schlangen
63
16
0
29 Aug 2019
Language Tasks and Language Games: On Methodology in Current Natural
  Language Processing Research
Language Tasks and Language Games: On Methodology in Current Natural Language Processing Research
David Schlangen
73
19
0
28 Aug 2019
Adversarial Representation Learning for Text-to-Image Matching
Adversarial Representation Learning for Text-to-Image Matching
N. Sarafianos
Xiang Xu
I. Kakadiaris
GAN
122
188
0
28 Aug 2019
Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual
  Contexts
Is the Red Square Big? MALeViC: Modeling Adjectives Leveraging Visual Contexts
Sandro Pezzelle
Raquel Fernández
VLM
58
18
0
27 Aug 2019
Visual Question Answering using Deep Learning: A Survey and Performance
  Analysis
Visual Question Answering using Deep Learning: A Survey and Performance Analysis
Yash Srivastava
Vaishnav Murali
S. Dubey
Snehasis Mukherjee
87
49
0
27 Aug 2019
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
Iro Laina
Christian Rupprecht
Nassir Navab
SSL
76
103
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLMMLLMSSL
318
1,672
0
22 Aug 2019
ViCo: Word Embeddings from Visual Co-occurrences
ViCo: Word Embeddings from Visual Co-occurrences
Tanmay Gupta
Alex Schwing
Derek Hoiem
65
25
0
22 Aug 2019
Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue
  Generation
Entropy-Enhanced Multimodal Attention Model for Scene-Aware Dialogue Generation
Kuan-Yen Lin
Chao-Chun Hsu
Yun-Nung Chen
Lun-Wei Ku
VGen
60
20
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLMMLLM
254
2,498
0
20 Aug 2019
Zero-Shot Grounding of Objects from Natural Language Queries
Zero-Shot Grounding of Objects from Natural Language Queries
Arka Sadhu
Kan Chen
Ram Nevatia
ObjD
102
159
0
20 Aug 2019
What is needed for simple spatial language capabilities in VQA?
What is needed for simple spatial language capabilities in VQA?
A. Kuhnle
Ann A. Copestake
CoGe
35
1
0
17 Aug 2019
Language Features Matter: Effective Language Representations for
  Vision-Language Tasks
Language Features Matter: Effective Language Representations for Vision-Language Tasks
Andrea Burns
Reuben Tan
Kate Saenko
Stan Sclaroff
Bryan A. Plummer
VLM
58
27
0
17 Aug 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAttUQCV
116
76
0
17 Aug 2019
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal
  Pre-training
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training
Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
Ming Zhou
SSLVLMMLLM
304
907
0
16 Aug 2019
PHYRE: A New Benchmark for Physical Reasoning
PHYRE: A New Benchmark for Physical Reasoning
A. Bakhtin
Laurens van der Maaten
Justin Johnson
Laura Gustafson
Ross B. Girshick
LRM
71
129
0
15 Aug 2019
MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from
  Natural Sentences
MemeFaceGenerator: Adversarial Synthesis of Chinese Meme-face from Natural Sentences
Yifu Chen
Zongsheng Wang
Bowen Wu
Mengyuan Li
Huan Zhang
Lin Ma
Feng Liu
Qihang Feng
Baoxun Wang
CVBM
25
3
0
14 Aug 2019
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling
Reactive Multi-Stage Feature Fusion for Multimodal Dialogue Modeling
Yi-Ting Yeh
Tzu-Chuan Lin
Hsiao-Hua Cheng
Yuanyuan Deng
Shang-Yu Su
Yun-Nung Chen
74
16
0
14 Aug 2019
Fusion of Detected Objects in Text for Visual Question Answering
Fusion of Detected Objects in Text for Visual Question Answering
Chris Alberti
Jeffrey Ling
Michael Collins
David Reitter
95
173
0
14 Aug 2019
Previous
123...454647...585960
Next