ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering
v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXiv (abs)PDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown
Title
Tensor Fusion Network for Multimodal Sentiment Analysis
Tensor Fusion Network for Multimodal Sentiment Analysis
Amir Zadeh
Minghai Chen
Soujanya Poria
Min Zhang
Louis-Philippe Morency
92
1,238
0
23 Jul 2017
Inspiring Computer Vision System Solutions
Inspiring Computer Vision System Solutions
J. Zilly
A. Boyarski
Micael Carvalho
Amir Atapour-Abarghouei
Konstantinos Amplianitis
...
Massimiliano Mancini
Hernán Gonzalez
Riccardo Spezialetti
Carlos Sampedro Pérez
Hao Li
18
1
0
22 Jul 2017
Video Question Answering via Attribute-Augmented Attention Network
  Learning
Video Question Answering via Attribute-Augmented Attention Network Learning
Yunan Ye
Zhou Zhao
Yimeng Li
Long Chen
Jun Xiao
Yueting Zhuang
80
109
0
20 Jul 2017
Visual Question Answering with Memory-Augmented Networks
Visual Question Answering with Memory-Augmented Networks
Chao Ma
Chunhua Shen
A. Dick
Qi Wu
Peng Wang
Anton Van Den Hengel
Ian Reid
90
100
0
17 Jul 2017
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory
  Network Based Approach
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Aidean Sharghi
Jacob S. Laurel
Boqing Gong
EgoV
122
137
0
16 Jul 2017
Automatic Understanding of Image and Video Advertisements
Automatic Understanding of Image and Video Advertisements
Zaeem Hussain
Ruotong Wang
Xiaozhong Zhang
Keren Ye
Christopher Thomas
Zuha Agha
Nathan Ong
Adriana Kovashka
DiffM
69
166
0
10 Jul 2017
Learning Visual Reasoning Without Strong Priors
Learning Visual Reasoning Without Strong Priors
Ethan Perez
H. D. Vries
Florian Strub
Vincent Dumoulin
Aaron Courville
OODNAI
108
62
0
10 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks
DeepStory: Video Story QA by Deep Embedded Memory Networks
Kyung-Min Kim
Min-Oh Heo
Seongho Choi
Byoung-Tak Zhang
97
175
0
04 Jul 2017
Modulating early visual processing by language
Modulating early visual processing by language
H. D. Vries
Florian Strub
Jérémie Mary
Hugo Larochelle
Olivier Pietquin
Aaron Courville
192
489
0
02 Jul 2017
Compact Tensor Pooling for Visual Question Answering
Compact Tensor Pooling for Visual Question Answering
Yang Shi
Tommaso Furlanello
Anima Anandkumar
16
0
0
20 Jun 2017
Identifying Spatial Relations in Images using Convolutional Neural
  Networks
Identifying Spatial Relations in Images using Convolutional Neural Networks
Mandar Haldekar
Ashwinkumar Ganesan
Tim Oates
44
39
0
13 Jun 2017
Learning to Extract Semantic Structure from Documents Using Multimodal
  Fully Convolutional Neural Network
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network
Xiao Yang
Ersin Yumer
P. Asente
Mike Kraley
Daniel Kifer
C. Lee Giles
78
230
0
07 Jun 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning
  to a Generative Visual Dialog Model
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model
Jiasen Lu
A. Kannan
Jianwei Yang
Devi Parikh
Dhruv Batra
BDL
102
137
0
05 Jun 2017
A simple neural network module for relational reasoning
A simple neural network module for relational reasoning
Adam Santoro
David Raposo
David Barrett
Mateusz Malinowski
Razvan Pascanu
Peter W. Battaglia
Timothy Lillicrap
GNNNAI
191
1,617
0
05 Jun 2017
Deep learning evaluation using deep linguistic processing
Deep learning evaluation using deep linguistic processing
A. Kuhnle
Ann A. Copestake
ELM
59
11
0
05 Jun 2017
Listen, Interact and Talk: Learning to Speak via Interaction
Listen, Interact and Talk: Learning to Speak via Interaction
Haichao Zhang
Haonan Yu
Wenyuan Xu
77
13
0
28 May 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
175
2,963
0
26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence
  Models for Fill-in-the-Blank Image Captioning
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning
Q. Sun
Stefan Lee
Dhruv Batra
BDL
84
44
0
24 May 2017
Learning Convolutional Text Representations for Visual Question
  Answering
Learning Convolutional Text Representations for Visual Question Answering
Zhengyang Wang
Shuiwang Ji
FAtt
71
15
0
18 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
H. Ben-younes
Rémi Cadène
Matthieu Cord
Nicolas Thome
171
584
0
18 May 2017
ParlAI: A Dialog Research Software Platform
ParlAI: A Dialog Research Software Platform
Alexander H. Miller
Will Feng
Adam Fisch
Jiasen Lu
Dhruv Batra
Antoine Bordes
Devi Parikh
Jason Weston
128
376
0
18 May 2017
Object-Level Context Modeling For Scene Classification with Context-CNN
Object-Level Context Modeling For Scene Classification with Context-CNN
Syed Ashar Javed
A. Nelakanti
VLM
81
10
0
11 May 2017
Survey of Visual Question Answering: Datasets and Techniques
Survey of Visual Question Answering: Datasets and Techniques
A. Gupta
50
38
0
10 May 2017
Inferring and Executing Programs for Visual Reasoning
Inferring and Executing Programs for Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Judy Hoffman
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
NAI
122
545
0
10 May 2017
Combating Human Trafficking with Deep Multimodal Models
Combating Human Trafficking with Deep Multimodal Models
Edmund Tong
Amir Zadeh
Cara Jones
Louis-Philippe Morency
82
51
0
08 May 2017
Supervised Learning of Universal Sentence Representations from Natural
  Language Inference Data
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data
Alexis Conneau
Douwe Kiela
Holger Schwenk
Loïc Barrault
Antoine Bordes
AI4TSSSL
254
2,106
0
05 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on
  Weakly-Supervised Classification and Localization of Common Thorax Diseases
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases
Xiaosong Wang
Yifan Peng
Le Lu
Zhiyong Lu
M. Bagheri
Ronald M. Summers
LM&MA
261
2,558
0
05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
87
139
0
03 May 2017
FOIL it! Find One mismatch between Image and Language caption
FOIL it! Find One mismatch between Image and Language caption
Ravi Shekhar
Sandro Pezzelle
Yauhen Klimovich
Aurélie Herbelot
Moin Nabi
E. Sangineto
Raffaella Bernardi
65
141
0
03 May 2017
The Forgettable-Watcher Model for Video Question Answering
The Forgettable-Watcher Model for Video Question Answering
Hongyang Xue
Zhou Zhao
Deng Cai
43
9
0
03 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image
  Captioner
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner
Tseng-Hung Chen
Yuan-Hong Liao
Ching-Yao Chuang
W. Hsu
Jianlong Fu
Min Sun
105
142
0
02 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question
  Answering
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Aroma Mahendru
Viraj Prabhu
Akrit Mohapatra
Dhruv Batra
Stefan Lee
NAI
108
38
0
01 May 2017
Speech-Based Visual Question Answering
Speech-Based Visual Question Answering
Ted Zhang
Dengxin Dai
Tinne Tuytelaars
Marie-Francine Moens
Luc Van Gool
85
25
0
01 May 2017
Mapping Instructions and Visual Observations to Actions with
  Reinforcement Learning
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning
Dipendra Kumar Misra
John Langford
Yoav Artzi
86
247
0
28 Apr 2017
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0
  Dataset
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
Aishwarya Agrawal
Aniruddha Kembhavi
Dhruv Batra
Devi Parikh
CoGe
70
80
0
26 Apr 2017
Paying Attention to Descriptions Generated by Image Captioning Models
Paying Attention to Descriptions Generated by Image Captioning Models
Hamed R. Tavakoli
Rakshith Shetty
Ali Borji
Jorma T. Laaksonen
80
79
0
24 Apr 2017
Towards Instance Segmentation with Object Priority: Prominent Object
  Detection and Recognition
Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition
Hamed R. Tavakoli
Jorma T. Laaksonen
40
1
0
24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks
An Analysis of Action Recognition Datasets for Language and Vision Tasks
Spandana Gella
Frank Keller
ObjD
48
11
0
24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better
  Visual Question Answering Datasets
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets
Wei-Lun Chao
Hexiang Hu
Fei Sha
89
37
0
24 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question
  Answering
Learning to Reason: End-to-End Module Networks for Visual Question Answering
Ronghang Hu
Jacob Andreas
Marcus Rohrbach
Trevor Darrell
Kate Saenko
KELMGNNReLMLRM
142
581
0
18 Apr 2017
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive
  Features For Semantic Matching
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching
David Novotny
Diane Larlus
Andrea Vedaldi
3DPC
98
66
0
16 Apr 2017
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal
  Attentions
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions
Amir Mazaheri
Dong Zhang
M. Shah
56
12
0
15 Apr 2017
ShapeWorld - A new test methodology for multimodal language
  understanding
ShapeWorld - A new test methodology for multimodal language understanding
A. Kuhnle
Ann A. Copestake
65
69
0
14 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Y. Jang
Yale Song
Youngjae Yu
Youngjin Kim
Gunhee Kim
93
562
0
14 Apr 2017
Spatial Memory for Context Reasoning in Object Detection
Spatial Memory for Context Reasoning in Object Detection
Xinlei Chen
Abhinav Gupta
ObjD
101
166
0
13 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR)
  Approach to Understanding Deep Neural Networks
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks
Devinder Kumar
Alexander Wong
Graham W. Taylor
82
61
0
13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection
  with Natural Language Queries
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries
Y. Zhang
Luyao Yuan
Yijie Guo
Zhiyuan He
I-An Huang
Honglak Lee
ObjD
92
57
0
12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision
What's in a Question: Using Visual Questions as a Form of Supervision
Siddha Ganju
Olga Russakovsky
Abhinav Gupta
78
16
0
12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders
Creativity: Generating Diverse Questions using Variational Autoencoders
Unnat Jain
Ziyu Zhang
Alex Schwing
72
152
0
11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
110
498
0
11 Apr 2017
Previous
123...555657585960
Next