Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering
Ruixue Tang
Chao Ma
W. Zhang
Qi Wu
Xiaokang Yang
OOD
72
49
0
19 Jul 2020
Length-Controllable Image Captioning
Chaorui Deng
Ning Ding
Mingkui Tan
Qi Wu
VLM
81
57
0
19 Jul 2020
Learning to Discretely Compose Reasoning Module Networks for Video Captioning
Ganchao Tan
Daqing Liu
Meng Wang
Zhengjun Zha
LRM
86
74
0
17 Jul 2020
Preserving Semantic Neighborhoods for Robust Cross-modal Retrieval
Christopher Thomas
Adriana Kovashka
128
41
0
16 Jul 2020
Explore and Explain: Self-supervised Navigation and Recounting
Roberto Bigazzi
Federico Landi
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
EgoV
LM&Ro
78
17
0
14 Jul 2020
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets
Jiuniu Wang
Wenjia Xu
Qingzhong Wang
Antoni B. Chan
70
45
0
14 Jul 2020
RATT: Recurrent Attention to Transient Tasks for Continual Image Captioning
Riccardo Del Chiaro
Bartlomiej Twardowski
Andrew D. Bagdanov
Joost van de Weijer
CLL
VLM
77
41
0
13 Jul 2020
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder
K. Gouthaman
Anurag Mittal
98
79
0
13 Jul 2020
Image Captioning with Compositional Neural Module Networks
Junjiao Tian
Jean Oh
44
11
0
10 Jul 2020
DCANet: Learning Connected Attentions for Convolutional Neural Networks
Xu Ma
Jingda Guo
Sihai Tang
Zhinan Qiao
Qi Chen
Qing Yang
Song Fu
41
15
0
09 Jul 2020
Learning to Reweight with Deep Interactions
Yang Fan
Yingce Xia
Lijun Wu
Shufang Xie
Weiqing Liu
Jiang Bian
Tao Qin
Xiang-Yang Li
76
9
0
09 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
SmaAt-UNet: Precipitation Nowcasting using a Small Attention-UNet Architecture
Kevin Trebing
Tomasz Stanczyk
S. Mehrkanoon
109
341
0
08 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
101
11
0
08 Jul 2020
Diverse and Styled Image Captioning Using SVD-Based Mixture of Recurrent Experts
Marzi Heidari
M. Ghatee
A. Nickabadi
Arash Pourhasan Nezhad
DiffM
MoE
84
1
0
07 Jul 2020
EDSL: An Encoder-Decoder Architecture with Symbol-Level Features for Printed Mathematical Expression Recognition
Yingnan Fu
Tingting Liu
Ming Gao
Aoying Zhou
100
7
0
06 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training
Yingwei Pan
Yehao Li
Jianjie Luo
Jun Xu
Ting Yao
Tao Mei
100
59
0
05 Jul 2020
Modality Shifting Attention Network for Multi-modal Video Question Answering
Junyeong Kim
Minuk Ma
T. Pham
Kyungsu Kim
Chang D. Yoo
84
72
0
04 Jul 2020
Learning to Discover Multi-Class Attentional Regions for Multi-Label Image Recognition
Bin-Bin Gao
Hong-Yu Zhou
71
115
0
03 Jul 2020
A Competence-aware Curriculum for Visual Concepts Learning via Question Answering
Qing Li
Siyuan Huang
Yining Hong
Song-Chun Zhu
119
29
0
03 Jul 2020
Scene Graph Reasoning for Visual Question Answering
Marcel Hildebrandt
Hang Li
Rajat Koner
Volker Tresp
Stephan Günnemann
GNN
79
64
0
02 Jul 2020
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour
Arijit Ray
Xiaoyu Lin
J. Schulze
Yi Yao
Giedrius Burachas
51
9
0
02 Jul 2020
DocVQA: A Dataset for VQA on Document Images
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
169
748
0
01 Jul 2020
Latent Compositional Representations Improve Systematic Generalization in Grounded Question Answering
Ben Bogin
Sanjay Subramanian
Matt Gardner
Jonathan Berant
ReLM
OOD
BDL
LRM
57
28
0
01 Jul 2020
A Transformer-based Audio Captioning Model with Keyword Estimation
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
116
54
0
01 Jul 2020
Modality-Agnostic Attention Fusion for visual search with text feedback
Eric Dodds
Jack Culpepper
Simão Herdade
Yang Zhang
K. Boakye
EgoV
100
74
0
30 Jun 2020
ERNIE-ViL: Knowledge Enhanced Vision-Language Representations Through Scene Graph
Fei Yu
Jiji Tang
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
128
382
0
30 Jun 2020
Graph Optimal Transport for Cross-Domain Alignment
Liqun Chen
Zhe Gan
Yu Cheng
Linjie Li
Lawrence Carin
Jingjing Liu
OT
115
152
0
26 Jun 2020
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering
C. Sur
21
9
0
25 Jun 2020
Improving Image Captioning with Better Use of Captions
Zhan Shi
Xu Zhou
Xipeng Qiu
Xiao-Dan Zhu
66
128
0
21 Jun 2020
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation
Shiyang Yan
Yang Hua
N. Robertson
OffRL
42
0
0
21 Jun 2020
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"
Saeed Amizadeh
Hamid Palangi
Oleksandr Polozov
Yichen Huang
K. Koishida
NAI
LRM
121
60
0
20 Jun 2020
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
122
16
0
18 Jun 2020
Overcoming Statistical Shortcuts for Open-ended Visual Counting
Corentin Dancette
Rémi Cadène
Xinlei Chen
Matthieu Cord
36
3
0
17 Jun 2020
Contrastive Learning for Weakly Supervised Phrase Grounding
Tanmay Gupta
Arash Vahdat
Gal Chechik
Xiaodong Yang
Jan Kautz
Derek Hoiem
ObjD
SSL
168
144
0
17 Jun 2020
Foreground-Background Imbalance Problem in Deep Object Detectors: A Review
Joya Chen
Qi Wu
Dong Liu
Tong Xu
ObjD
52
25
0
16 Jun 2020
Mucko: Multi-Layer Cross-Modal Knowledge Reasoning for Fact-based Visual Question Answering
Zihao Zhu
Jiahao Yu
Yujing Wang
Yajing Sun
Yue Hu
Qi Wu
107
129
0
16 Jun 2020
Exploiting Visual Semantic Reasoning for Video-Text Retrieval
Zerun Feng
Zhimin Zeng
Caili Guo
Zheng Li
79
36
0
16 Jun 2020
ORD: Object Relationship Discovery for Visual Dialogue Generation
Ziwei Wang
Zi Huang
Yadan Luo
Huimin Lu
49
4
0
15 Jun 2020
Mitigating Gender Bias in Captioning Systems
Ruixiang Tang
Mengnan Du
Yuening Li
Zirui Liu
Na Zou
Helen Zhou
FaML
124
66
0
15 Jun 2020
AMENet: Attentive Maps Encoder Network for Trajectory Prediction
Hao Cheng
Wentong Liao
M. Yang
Bodo Rosenhahn
Monika Sester
88
46
0
15 Jun 2020
Sparse and Continuous Attention Mechanisms
André F. T. Martins
António Farinhas
Marcos Vinícius Treviso
Vlad Niculae
P. Aguiar
Mário A. T. Figueiredo
77
41
0
12 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
173
437
0
11 Jun 2020
Large-Scale Adversarial Training for Vision-and-Language Representation Learning
Zhe Gan
Yen-Chun Chen
Linjie Li
Chen Zhu
Yu Cheng
Jingjing Liu
ObjD
VLM
133
501
0
11 Jun 2020
Estimating semantic structure for the VQA answer space
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
47
4
0
10 Jun 2020
Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To?
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
OOD
90
90
0
09 Jun 2020
Counterfactual VQA: A Cause-Effect Look at Language Bias
Yulei Niu
Kaihua Tang
Hanwang Zhang
Zhiwu Lu
Xiansheng Hua
Ji-Rong Wen
CML
147
403
0
08 Jun 2020
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation
Mingjie Li
Fuyu Wang
Xiaojun Chang
Xiaodan Liang
MedIm
86
107
0
06 Jun 2020
A Dataset and Benchmarks for Multimedia Social Analysis
Bofan Xue
David M. Chan
John F. Canny
VGen
44
0
0
05 Jun 2020
Explaining Autonomous Driving by Learning End-to-End Visual Attention
Luca Cultrera
Lorenzo Seidenari
Federico Becattini
P. Pala
A. Bimbo
65
49
0
05 Jun 2020
Previous
1
2
3
...
27
28
29
...
36
37
38
Next