Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
18 / 1,868 papers shown
Title
Interactive Grounded Language Acquisition and Generalization in a 2D World
Haonan Yu
Haichao Zhang
Wenyuan Xu
LLMAG
LM&Ro
194
79
0
31 Jan 2018
Object-based reasoning in VQA
Mikyas T. Desta
Larry Chen
Tomasz Kornuta
67
33
0
29 Jan 2018
Tell-and-Answer: Towards Explainable Visual Question Answering using Attributes and Captions
Qing Li
Jianlong Fu
D. Yu
Tao Mei
Jiebo Luo
FAtt
XAI
CoGe
97
60
0
27 Jan 2018
Consensus-based Sequence Training for Video Captioning
Sang Phan Le
G. Henter
Yusuke Miyao
Shiníchi Satoh
3DV
27
22
0
27 Dec 2017
Incorporating External Knowledge to Answer Open-Domain Visual Questions with Dynamic Memory Networks
Guohao Li
Hang Su
Wenwu Zhu
100
46
0
03 Dec 2017
Embodied Question Answering
Abhishek Das
Samyak Datta
Georgia Gkioxari
Stefan Lee
Devi Parikh
Dhruv Batra
LM&Ro
111
652
0
30 Nov 2017
Multimodal Attribute Extraction
Robert L Logan IV
Samuel Humeau
Sameer Singh
64
27
0
29 Nov 2017
Convolutional Image Captioning
J. Aneja
Aditya Deshpande
Alex Schwing
VLM
137
361
0
24 Nov 2017
Visual Question Answering as a Meta Learning Task
Damien Teney
Anton Van Den Hengel
OOD
78
42
0
22 Nov 2017
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson
Qi Wu
Damien Teney
Jake Bruce
Mark Johnson
Niko Sünderhauf
Ian Reid
Stephen Gould
Anton Van Den Hengel
LM&Ro
160
1,325
0
20 Nov 2017
ADVISE: Symbolism and External Knowledge for Decoding Advertisements
Keren Ye
Adriana Kovashka
79
51
0
17 Nov 2017
Grounded Objects and Interactions for Video Captioning
Chih-Yao Ma
Asim Kadav
I. Melvin
Z. Kira
G. Al-Regib
H. Graf
49
6
0
16 Nov 2017
Attend and Interact: Higher-Order Object Interactions for Video Understanding
Chih-Yao Ma
Asim Kadav
I. Melvin
Z. Kira
G. Al-Regib
H. Graf
80
145
0
16 Nov 2017
Fooling Vision and Language Models Despite Localization and Attention Mechanism
Xiaojun Xu
Xinyun Chen
Chang-rui Liu
Anna Rohrbach
Trevor Darrell
Basel Alomair
AAML
99
41
0
25 Sep 2017
Stack-Captioning: Coarse-to-Fine Learning for Image Captioning
Jiuxiang Gu
Jianfei Cai
G. Wang
Tsuhan Chen
107
181
0
11 Sep 2017
Beyond Bilinear: Generalized Multimodal Factorized High-order Pooling for Visual Question Answering
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Jianping Fan
Dacheng Tao
98
462
0
10 Aug 2017
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge
Damien Teney
Peter Anderson
Xiaodong He
Anton Van Den Hengel
123
383
0
09 Aug 2017
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
381
3,275
0
02 Dec 2016
Previous
1
2
3
...
36
37
38