Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Injecting Prior Knowledge into Image Caption Generation
A. Goel
Basura Fernando
Thanh-Son Nguyen
Hakan Bilen
33
0
0
22 Nov 2019
Reinforcing an Image Caption Generator Using Off-Line Human Feedback
Paul Hongsuck Seo
Piyush Sharma
Tomer Levinboim
Bohyung Han
Radu Soricut
OffRL
72
22
0
21 Nov 2019
Temporal Reasoning via Audio Question Answering
Haytham M. Fayek
Justin Johnson
65
54
0
21 Nov 2019
Learning Cross-modal Context Graph for Visual Grounding
Yongfei Liu
Bo Wan
Xiao-Dan Zhu
Xuming He
94
91
0
20 Nov 2019
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAML
FAtt
85
26
0
19 Nov 2019
Vision-Language Navigation with Self-Supervised Auxiliary Reasoning Tasks
Fengda Zhu
Yi Zhu
Xiaojun Chang
Xiaodan Liang
LRM
115
244
0
18 Nov 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
X. Jiang
Jiahao Yu
Zengchang Qin
Yingying Zhuang
Xingxing Zhang
Yue Hu
Qi Wu
90
70
0
17 Nov 2019
Iterative Answer Prediction with Pointer-Augmented Multimodal Transformers for TextVQA
Ronghang Hu
Amanpreet Singh
Trevor Darrell
Marcus Rohrbach
96
197
0
14 Nov 2019
Visual Dialogue State Tracking for Question Generation
Wei Pang
Xiaojie Wang
75
33
0
12 Nov 2019
Conditionally Learn to Pay Attention for Sequential Visual Task
Jun He
Quan-Jie Cao
Lei Zhang
44
0
0
11 Nov 2019
Open-Ended Visual Question Answering by Multi-Modal Domain Adaptation
Yiming Xu
Lin Chen
Zhongwei Cheng
Lixin Duan
Jiebo Luo
OOD
86
24
0
11 Nov 2019
Multimodal Intelligence: Representation Learning, Information Fusion, and Applications
Chao Zhang
Zichao Yang
Xiaodong He
Li Deng
HAI
AI4TS
122
338
0
10 Nov 2019
Drill-down: Interactive Retrieval of Complex Scenes using Natural Language Queries
Fuwen Tan
Paola Cascante-Bonilla
Xiaoxiao Guo
Hui Wu
Song Feng
Vicente Ordonez
66
30
0
10 Nov 2019
Contextual Grounding of Natural Language Entities in Images
Farley Lai
Ning Xie
Derek Doran
Asim Kadav
ObjD
55
6
0
05 Nov 2019
Predicting the Politics of an Image Using Webly Supervised Data
Christopher Thomas
Adriana Kovashka
SSL
84
21
0
31 Oct 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
Alex Schwing
LRM
ReLM
102
9
0
31 Oct 2019
Learning Rich Image Region Representation for Visual Question Answering
Bei Liu
Zhicheng Huang
Zhaoyang Zeng
Zheyu Chen
Jianlong Fu
60
9
0
29 Oct 2019
Heterogeneous Graph Learning for Visual Commonsense Reasoning
Weijiang Yu
Jingwen Zhou
Weihao Yu
Xiaodan Liang
Nong Xiao
LRM
79
47
0
25 Oct 2019
KnowIT VQA: Answering Knowledge-Based Questions about Videos
Noa Garcia
Mayu Otani
Chenhui Chu
Yuta Nakashima
152
80
0
23 Oct 2019
Enforcing Reasoning in Visual Commonsense Reasoning
Hammad A. Ayyubi
Md. Mehrab Tanjim
D. Kriegman
ReLM
OOD
57
2
0
21 Oct 2019
Vatex Video Captioning Challenge 2020: Multi-View Features and Hybrid Reward Strategies for Video Captioning
Xinxin Zhu
A. Gorban
V. A. Makarov
Shichen Lu
I. Tyukin
Hanqing Lu
35
2
0
17 Oct 2019
Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019
Shizhe Chen
Yida Zhao
Yuqing Song
Qin Jin
Qi Wu
30
0
0
15 Oct 2019
Understanding Misclassifications by Attributes
Sadaf Gulshad
Zeynep Akata
J. H. Metzen
A. Smeulders
AAML
95
0
0
15 Oct 2019
Exploring Overall Contextual Information for Image Captioning in Human-Like Cognitive Style
Hongwei Ge
Zehang Yan
Kai Zhang
Mingde Zhao
Liang Sun
54
25
0
15 Oct 2019
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning
Ziqi Zhang
Yaya Shi
Jiutong Wei
Chunfen Yuan
Bing Li
Weiming Hu
47
0
0
13 Oct 2019
Granular Multimodal Attention Networks for Visual Dialog
Badri N. Patro
Shivansh Patel
Vinay P. Namboodiri
114
1
0
13 Oct 2019
Cross-modal Scene Graph Matching for Relationship-aware Image-Text Retrieval
Sijin Wang
Ruiping Wang
Ziwei Yao
Shiguang Shan
Xilin Chen
3DV
88
213
0
11 Oct 2019
Multi-modal Deep Analysis for Multimedia
Wenwu Zhu
Xin Eric Wang
Hongzhi Li
74
43
0
11 Oct 2019
Semantic-aware Image Deblurring
Fuhai Chen
Rongrong Ji
Chengpeng Dai
Xiaoshuai Sun
Chia-Wen Lin
Jiayi Ji
Baochang Zhang
Feiyue Huang
Liujuan Cao
BDL
VLM
111
6
0
09 Oct 2019
Modulated Self-attention Convolutional Network for VQA
Jean-Benoit Delbrouck
Antoine Maiorca
Nathan Hubens
Stéphane Dupont
25
1
0
08 Oct 2019
Meta Module Network for Compositional Visual Reasoning
Wenhu Chen
Zhe Gan
Linjie Li
Yu Cheng
Wenjie Wang
Jingjing Liu
LRM
93
71
0
08 Oct 2019
SMArT: Training Shallow Memory-aware Transformers for Robotic Explainability
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
162
29
0
07 Oct 2019
REMIND Your Neural Network to Prevent Catastrophic Forgetting
Tyler L. Hayes
Kushal Kafle
Robik Shrestha
Manoj Acharya
Christopher Kanan
CLL
155
303
0
06 Oct 2019
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
A. Vasudevan
Ahmed K. Farahat
Chetan Gupta
LM&Ro
67
2
0
04 Oct 2019
Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations
Po-Yao (Bernie) Huang
Xiaojun Chang
Alexander G. Hauptmann
138
25
0
30 Sep 2019
UNITER: UNiversal Image-TExt Representation Learning
Yen-Chun Chen
Linjie Li
Licheng Yu
Ahmed El Kholy
Faisal Ahmed
Zhe Gan
Yu Cheng
Jingjing Liu
VLM
OT
134
449
0
25 Sep 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
365
948
0
24 Sep 2019
6D Pose Estimation with Correlation Fusion
Yi Cheng
Erik Cambria
Ying Sun
C. Acar
Wei Jing
Yan Wu
Liyuan Li
Cheston Tan
Joo-Hwee Lim
87
15
0
24 Sep 2019
Non-monotonic Logical Reasoning Guiding Deep Learning for Explainable Visual Question Answering
Heather Riley
Mohan Sridharan
NAI
54
0
0
23 Sep 2019
Explainable High-order Visual Question Reasoning: A New Benchmark and Knowledge-routed Network
Qingxing Cao
Bailin Li
Xiaodan Liang
Liang Lin
57
13
0
23 Sep 2019
Adaptively Aligned Image Captioning via Adaptive Attention Time
Lun Huang
Wenmin Wang
Yaxian Xia
Jie Chen
74
63
0
19 Sep 2019
Large-scale representation learning from visually grounded untranscribed speech
Gabriel Ilharco
Yuan Zhang
Jason Baldridge
SSL
87
61
0
19 Sep 2019
Graph Neural Networks for Image Understanding Based on Multiple Cues: Group Emotion Recognition and Event Recognition as Use Cases
Xin Guo
Luisa F. Polanía
Bin Zhu
C. Boncelet
Kenneth Barner
83
28
0
19 Sep 2019
Inverse Visual Question Answering with Multi-Level Attentions
Yaser Alwatter
Yuhong Guo
BDL
35
1
0
17 Sep 2019
Part-Guided Attention Learning for Vehicle Instance Retrieval
Xinyu Zhang
Rufeng Zhang
Jiewei Cao
Dong Gong
Mingyu You
Chunhua Shen
97
37
0
13 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
110
310
0
12 Sep 2019
PDANet: Polarity-consistent Deep Attention Network for Fine-grained Visual Emotion Regression
Sicheng Zhao
Zizhou Jia
Hui Chen
Leida Li
Guiguang Ding
Kurt Keutzer
89
62
0
11 Sep 2019
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
141
13
0
11 Sep 2019
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation
Arijit Ray
Karan Sikka
Ajay Divakaran
Stefan Lee
Giedrius Burachas
83
65
0
10 Sep 2019
Compositional Generalization in Image Captioning
Mitja Nikolaus
Mostafa Abdou
Matthew Lamm
Rahul Aralikatte
Desmond Elliott
CoGe
89
49
0
10 Sep 2019
Previous
1
2
3
...
31
32
33
...
36
37
38
Next