ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1505.00468
  4. Cited By
VQA: Visual Question Answering

VQA: Visual Question Answering

3 May 2015
Aishwarya Agrawal
Jiasen Lu
Stanislaw Antol
Margaret Mitchell
C. L. Zitnick
Dhruv Batra
Devi Parikh
    CoGe
ArXivPDFHTML

Papers citing "VQA: Visual Question Answering"

50 / 2,890 papers shown
Title
Visual Translation Embedding Network for Visual Relation Detection
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
30
48
0
23 Feb 2017
Task-driven Visual Saliency and Attention-based Visual Question
  Answering
Task-driven Visual Saliency and Attention-based Visual Question Answering
Yuetan Lin
Zhangyang Pang
Donghui Wang
Yueting Zhuang
35
26
0
22 Feb 2017
Person Search with Natural Language Description
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
30
386
0
19 Feb 2017
Gated Multimodal Units for Information Fusion
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
33
371
0
07 Feb 2017
Living a discrete life in a continuous world: Reference with distributed
  representations
Living a discrete life in a continuous world: Reference with distributed representations
Gemma Boleda
Sebastian Padó
N. Pham
Marco Baroni
8
0
0
06 Feb 2017
Image-Grounded Conversations: Multimodal Context for Natural Question
  and Response Generation
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
N. Mostafazadeh
Chris Brockett
W. Dolan
Michel Galley
Jianfeng Gao
Georgios P. Spithourakis
Lucy Vanderwende
26
181
0
28 Jan 2017
Context-aware Captions from Context-agnostic Supervision
Context-aware Captions from Context-agnostic Supervision
Ramakrishna Vedantam
Samy Bengio
Kevin Patrick Murphy
Devi Parikh
Gal Chechik
22
152
0
11 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
46
273
0
30 Dec 2016
Learning Visual N-Grams from Web Data
Learning Visual N-Grams from Web Data
Ang Li
Allan Jabri
Armand Joulin
Laurens van der Maaten
VLM
20
136
0
29 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language
  Machine Comprehension Task
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task
Nan Ding
Sebastian Goodman
Fei Sha
Radu Soricut
VLM
27
9
0
22 Dec 2016
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary
  Visual Reasoning
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Justin Johnson
B. Hariharan
Laurens van der Maaten
Li Fei-Fei
C. L. Zitnick
Ross B. Girshick
CoGe
53
2,322
0
20 Dec 2016
Automatic Generation of Grounded Visual Questions
Automatic Generation of Grounded Visual Questions
Shijie Zhang
Lizhen Qu
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
27
79
0
20 Dec 2016
The VQA-Machine: Learning How to Use Existing Vision Algorithms to
  Answer New Questions
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
Peng Wang
Qi Wu
Chunhua Shen
Anton Van Den Hengel
OOD
39
86
0
16 Dec 2016
Attentive Explanations: Justifying Decisions and Pointing to the
  Evidence
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park
Lisa Anne Hendricks
Zeynep Akata
Bernt Schiele
Trevor Darrell
Marcus Rohrbach
AAML
24
79
0
14 Dec 2016
Learning to Hash-tag Videos with Tag2Vec
Learning to Hash-tag Videos with Tag2Vec
A. Singh
Saurabh Saini
R. Shah
P. J. Narayanan
27
1
0
13 Dec 2016
VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question
  Answering
VIBIKNet: Visual Bidirectional Kernelized Network for Visual Question Answering
Marc Bolaños
Álvaro Peris
F. Casacuberta
Petia Radeva
32
6
0
12 Dec 2016
MarioQA: Answering Questions by Watching Gameplay Videos
MarioQA: Answering Questions by Watching Gameplay Videos
Jonghwan Mun
Paul Hongsuck Seo
Ilchae Jung
Bohyung Han
50
108
0
06 Dec 2016
ImageNet pre-trained models with batch normalization
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
44
165
0
05 Dec 2016
Deep Multi-Modal Image Correspondence Learning
Deep Multi-Modal Image Correspondence Learning
Chen Liu
Jiajun Wu
Pushmeet Kohli
Yasutaka Furukawa
13
5
0
05 Dec 2016
Who is Mistaken?
Who is Mistaken?
Benjamin Eysenbach
Carl Vondrick
Antonio Torralba
35
15
0
04 Dec 2016
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Commonly Uncommon: Semantic Sparsity in Situation Recognition
Mark Yatskar
Vicente Ordonez
Luke Zettlemoyer
Ali Farhadi
VLM
17
42
0
03 Dec 2016
Making the V in VQA Matter: Elevating the Role of Image Understanding in
  Visual Question Answering
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
146
3,130
0
02 Dec 2016
Visual Dialog
Visual Dialog
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
69
990
0
26 Nov 2016
GuessWhat?! Visual object discovery through multi-modal dialogue
GuessWhat?! Visual object discovery through multi-modal dialogue
H. D. Vries
Florian Strub
A. Chandar
Olivier Pietquin
Hugo Larochelle
Aaron Courville
VLM
50
427
0
23 Nov 2016
A dataset and exploration of models for understanding video data through
  fill-in-the-blank question-answering
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Tegan Maharaj
Nicolas Ballas
Anna Rohrbach
Aaron Courville
C. Pal
VGen
15
107
0
23 Nov 2016
Dense Captioning with Joint Inference and Visual Context
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
30
169
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive
  Image-Language Cues
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
33
189
0
21 Nov 2016
Recurrent Memory Addressing for describing videos
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
38
10
0
20 Nov 2016
Answering Image Riddles using Vision and Reasoning through Probabilistic
  Soft Logic
Answering Image Riddles using Vision and Reasoning through Probabilistic Soft Logic
Somak Aditya
Yezhou Yang
Chitta Baral
Yiannis Aloimonos
ReLM
14
4
0
17 Nov 2016
Nothing Else Matters: Model-Agnostic Explanations By Identifying
  Prediction Invariance
Nothing Else Matters: Model-Agnostic Explanations By Identifying Prediction Invariance
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
17
63
0
17 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks
  for Image Captioning
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
Long Chen
Hanwang Zhang
Jun Xiao
Liqiang Nie
Jian Shao
Wei Liu
Tat-Seng Chua
27
1,650
0
17 Nov 2016
Zero-Shot Visual Question Answering
Zero-Shot Visual Question Answering
Damien Teney
Anton Van Den Hengel
29
73
0
17 Nov 2016
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels
  in Comic Book Narratives
The Amazing Mysteries of the Gutter: Drawing Inferences Between Panels in Comic Book Narratives
Mohit Iyyer
Varun Manjunatha
Anupam Guha
Yogarshi Vyas
Jordan L. Boyd-Graber
Hal Daumé
L. Davis
30
95
0
16 Nov 2016
Leveraging Video Descriptions to Learn Video Question Answering
Leveraging Video Descriptions to Learn Video Question Answering
Kuo-Hao Zeng
Tseng-Hung Chen
Ching-Yao Chuang
Yuan-Hong Liao
Juan Carlos Niebles
Min Sun
32
175
0
12 Nov 2016
Crowdsourcing in Computer Vision
Crowdsourcing in Computer Vision
Adriana Kovashka
Olga Russakovsky
Li Fei-Fei
Kristen Grauman
HAI
VLM
3DV
49
149
0
07 Nov 2016
Dynamic Coattention Networks For Question Answering
Dynamic Coattention Networks For Question Answering
Caiming Xiong
Victor Zhong
R. Socher
AIMat
40
684
0
05 Nov 2016
Bidirectional Attention Flow for Machine Comprehension
Bidirectional Attention Flow for Machine Comprehension
Minjoon Seo
Aniruddha Kembhavi
Ali Farhadi
Hannaneh Hajishirzi
65
2,087
0
05 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching
Dual Attention Networks for Multimodal Reasoning and Matching
Hyeonseob Nam
Jung-Woo Ha
Jeonghee Kim
39
664
0
02 Nov 2016
End-to-end Learning of Deep Visual Representations for Image Retrieval
End-to-end Learning of Deep Visual Representations for Image Retrieval
Albert Gordo
Jon Almazán
Jérôme Revaud
Diane Larlus
VLM
30
536
0
25 Oct 2016
Proposing Plausible Answers for Open-ended Visual Question Answering
Proposing Plausible Answers for Open-ended Visual Question Answering
Omid Bakhshandeh
Trung Bui
Zhe-nan Lin
W. Chang
29
1
0
20 Oct 2016
Deep Identity-aware Transfer of Facial Attributes
Deep Identity-aware Transfer of Facial Attributes
Mu Li
W. Zuo
David C. Zhang
CVBM
35
149
0
18 Oct 2016
Video Fill in the Blank with Merging LSTMs
Video Fill in the Blank with Merging LSTMs
Amir Mazaheri
Dong-Ming Zhang
M. Shah
32
18
0
13 Oct 2016
Open-Ended Visual Question-Answering
Open-Ended Visual Question-Answering
Issey Masuda
Santiago Pascual de la Puente
Xavier Giró-i-Nieto
28
9
0
09 Oct 2016
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence
  Models
Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models
Ashwin K. Vijayakumar
Michael Cogswell
Ramprasaath R. Selvaraju
Q. Sun
Stefan Lee
David J. Crandall
Dhruv Batra
28
542
0
07 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
  Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
68
19,607
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
33
235
0
05 Oct 2016
A Survey of Multi-View Representation Learning
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
37
509
0
03 Oct 2016
Contextual RNN-GANs for Abstract Reasoning Diagram Generation
Contextual RNN-GANs for Abstract Reasoning Diagram Generation
Arna Ghosh
Viveka Kulharia
A. Mukerjee
Vinay P. Namboodiri
Joey Tianyi Zhou
GAN
33
37
0
29 Sep 2016
Learning Language-Visual Embedding for Movie Understanding with
  Natural-Language
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Atousa Torabi
Niket Tandon
Leonid Sigal
22
97
0
26 Sep 2016
Previous
123...55565758
Next