Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
An Entropy Clustering Approach for Assessing Visual Question Difficulty
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
Shuníchi Satoh
OOD
AAML
58
1
0
12 Apr 2020
YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos
Shizhe Chen
Weiying Wang
Ludan Ruan
Linli Yao
Qin Jin
37
3
0
12 Apr 2020
Rephrasing visual questions by specifying the entropy of the answer distribution
K. Terao
Toru Tamaki
B. Raytchev
K. Kaneda
S. Satoh
OOD
44
2
0
10 Apr 2020
Learning to Scale Multilingual Representations for Vision-Language Tasks
Andrea Burns
Donghyun Kim
Derry Wijaya
Kate Saenko
Bryan A. Plummer
50
35
0
09 Apr 2020
e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations
Virginie Do
Oana-Maria Camburu
Zeynep Akata
Thomas Lukasiewicz
LRM
99
30
0
07 Apr 2020
Context-Aware Group Captioning via Self-Attention and Contrastive Features
Zhuowan Li
Quan Hung Tran
Long Mai
Zhe Lin
Alan Yuille
VLM
81
44
0
07 Apr 2020
Sub-Instruction Aware Vision-and-Language Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Qi Wu
Stephen Gould
129
72
0
06 Apr 2020
B-SCST: Bayesian Self-Critical Sequence Training for Image Captioning
Shashank Bujimalla
Mahesh Subedar
Omesh Tickoo
BDL
UQCV
25
10
0
06 Apr 2020
Iterative Context-Aware Graph Inference for Visual Dialog
Dan Guo
Haibo Wang
Hanwang Zhang
Zhengjun Zha
Meng Wang
79
49
0
05 Apr 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
197
440
0
02 Apr 2020
Consistent Multiple Sequence Decoding
Bicheng Xu
Leonid Sigal
57
0
0
02 Apr 2020
More Grounded Image Captioning by Distilling Image-Text Matching Model
Yuanen Zhou
Meng Wang
Daqing Liu
Zhenzhen Hu
Hanwang Zhang
90
126
0
01 Apr 2020
X-Linear Attention Networks for Image Captioning
Yingwei Pan
Ting Yao
Yehao Li
Tao Mei
134
519
0
31 Mar 2020
Modulating Bottom-Up and Top-Down Visual Processing via Language-Conditional Filters
.Ilker Kesen
Ozan Arkan Can
Erkut Erdem
Aykut Erdem
Deniz Yuret
VLM
53
1
0
28 Mar 2020
Assessing Image Quality Issues for Real-World Problems
Tai-Yin Chiu
Yinan Zhao
Danna Gurari
137
54
0
27 Mar 2020
Grounded Situation Recognition
Sarah M Pratt
Mark Yatskar
Luca Weihs
Ali Farhadi
Aniruddha Kembhavi
99
112
0
26 Mar 2020
P
≈
\approx
≈
NP, at least in Visual Question Answering
Shailza Jolly
Sebastián M. Palacio
Joachim Folz
Federico Raue
Jörn Hees
Andreas Dengel
24
0
0
26 Mar 2020
Neural encoding and interpretation for high-level visual cortices based on fMRI using image caption features
Kai Qiao
Chi Zhang
Jian Chen
Linyuan Wang
Li Tong
Bin Yan
21
3
0
26 Mar 2020
Egoshots, an ego-vision life-logging dataset and semantic fidelity metric to evaluate diversity in image captioning models
Pranav Agarwal
Alejandro Betancourt
V. Panagiotou
Natalia Díaz Rodríguez
EGVM
82
10
0
26 Mar 2020
Learning Compact Reward for Image Captioning
Nannan Li
Zhenzhong Chen
66
3
0
24 Mar 2020
TextCaps: a Dataset for Image Captioning with Reading Comprehension
Oleksii Sidorov
Ronghang Hu
Marcus Rohrbach
Amanpreet Singh
103
418
0
24 Mar 2020
Video Object Grounding using Semantic Roles in Language Description
Arka Sadhu
Kan Chen
Ram Nevatia
143
48
0
24 Mar 2020
Linguistically Driven Graph Capsule Network for Visual Question Reasoning
Qingxing Cao
Xiaodan Liang
Keze Wang
Liang Lin
GNN
47
3
0
23 Mar 2020
A Better Variant of Self-Critical Sequence Training
Ruotian Luo
BDL
73
37
0
22 Mar 2020
Visual Question Answering for Cultural Heritage
P. Bongini
Federico Becattini
Andrew D. Bagdanov
A. Bimbo
479
24
0
22 Mar 2020
Exploring Bottom-up and Top-down Cues with Attentive Learning for Webly Supervised Object Detection
Zhonghua Wu
Qingyi Tao
Guosheng Lin
Jianfei Cai
ObjD
70
14
0
22 Mar 2020
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
Longteng Guo
Jing Liu
Xinxin Zhu
Peng Yao
Shichen Lu
Hanqing Lu
ViT
201
192
0
19 Mar 2020
RSVQA: Visual Question Answering for Remote Sensing Data
Sylvain Lobry
Diego Marcos
J. Murray
D. Tuia
126
223
0
16 Mar 2020
Vision-Dialog Navigation by Exploring Cross-modal Memory
Yi Zhu
Fengda Zhu
Zhaohuan Zhan
Bingqian Lin
Jianbin Jiao
Xiaojun Chang
Xiaodan Liang
VLM
91
49
0
15 Mar 2020
Counterfactual Samples Synthesizing for Robust Visual Question Answering
Long Chen
Xin Yan
Jun Xiao
Hanwang Zhang
Shiliang Pu
Yueting Zhuang
OOD
AAML
219
294
0
14 Mar 2020
Analyzing Visual Representations in Embodied Navigation Tasks
Erik Wijmans
Julian Straub
Dhruv Batra
Irfan Essa
Judy Hoffman
Ari S. Morcos
75
2
0
12 Mar 2020
MQA: Answering the Question via Robotic Manipulation
Yuhong Deng
Di Guo
F. Sun
Naifu Zhang
Huaping Liu
Chen Pang
76
22
0
10 Mar 2020
Deconfounded Image Captioning: A Causal Retrospect
Xu Yang
Hanwang Zhang
Jianfei Cai
CML
79
127
0
09 Mar 2020
Better Captioning with Sequence-Level Exploration
Jia Chen
Qin Jin
61
12
0
08 Mar 2020
OVC-Net: Object-Oriented Video Captioning with Temporal Graph and Detail Enhancement
Fangyi Zhu
Lei Li
Zhanyu Ma
Guang Chen
Jun Guo
36
1
0
08 Mar 2020
Adaptive Offline Quintuplet Loss for Image-Text Matching
Tianlang Chen
Jiajun Deng
Jiebo Luo
234
70
0
07 Mar 2020
Captioning Images with Novel Objects via Online Vocabulary Expansion
Mikihiro Tanaka
Tatsuya Harada
3DV
77
2
0
06 Mar 2020
Show, Edit and Tell: A Framework for Editing Image Captions
Fawaz Sammani
Luke Melas-Kyriazi
KELM
DiffM
108
59
0
06 Mar 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning
Qiaolin Xia
Haoyang Huang
Nan Duan
Dongdong Zhang
Lei Ji
Zhifang Sui
Edward Cui
Taroon Bharti
Xin Liu
Ming Zhou
MLLM
VLM
103
76
0
03 Mar 2020
Toward Interpretability of Dual-Encoder Models for Dialogue Response Suggestions
Yitong Li
Dianqi Li
Sushant Prakash
Peng Wang
66
2
0
02 Mar 2020
Say As You Wish: Fine-grained Control of Image Caption Generation with Abstract Scene Graphs
Shizhe Chen
Qin Jin
Peng Wang
Qi Wu
DiffM
131
219
0
01 Mar 2020
Visual Commonsense R-CNN
Tan Wang
Jianqiang Huang
Hanwang Zhang
Qianru Sun
SSL
ObjD
CML
86
252
0
27 Feb 2020
Domain Decluttering: Simplifying Images to Mitigate Synthetic-Real Domain Shift and Improve Depth Estimation
Yunhan Zhao
Shu Kong
Daeyun Shin
Charless C. Fowlkes
MDE
76
44
0
27 Feb 2020
GATCluster: Self-Supervised Gaussian-Attention Network for Image Clustering
Chuang Niu
Jun Zhang
Ge Wang
Jimin Liang
SSL
90
70
0
27 Feb 2020
Analysis of diversity-accuracy tradeoff in image captioning
Ruotian Luo
Gregory Shakhnarovich
65
13
0
27 Feb 2020
What BERT Sees: Cross-Modal Transfer for Visual Question Generation
Thomas Scialom
Patrick Bordes
Paul-Alexis Dray
Jacopo Staiano
Patrick Gallinari
59
6
0
25 Feb 2020
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
Weituo Hao
Chunyuan Li
Xiujun Li
Lawrence Carin
Jianfeng Gao
LM&Ro
122
283
0
25 Feb 2020
Exploiting the Full Capacity of Deep Neural Networks while Avoiding Overfitting by Targeted Sparsity Regularization
Karim Huesmann
Soeren Klemm
Lars Linsen
Benjamin Risse
27
2
0
21 Feb 2020
A Convolutional Baseline for Person Re-Identification Using Vision and Language Descriptions
Ammarah Farooq
Muhammad Awais
F. Yan
J. Kittler
A. Akbari
S. S. Khalid
114
8
0
20 Feb 2020
Captioning Images Taken by People Who Are Blind
Danna Gurari
Yinan Zhao
Meng Zhang
Nilavra Bhattacharya
105
184
0
20 Feb 2020
Previous
1
2
3
...
29
30
31
...
36
37
38
Next