Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.07998
Cited By
v1
v2
v3 (latest)
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
25 July 2017
Peter Anderson
Xiaodong He
Chris Buehler
Damien Teney
Mark Johnson
Stephen Gould
Lei Zhang
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering"
50 / 1,868 papers shown
Title
FindIt: Generalized Localization with Natural Language Queries
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
88
17
0
31 Mar 2022
SimVQA: Exploring Simulated Environments for Visual Question Answering
Paola Cascante-Bonilla
Hui Wu
Letao Wang
Rogerio Feris
Vicente Ordonez
84
7
0
31 Mar 2022
NICGSlowDown: Evaluating the Efficiency Robustness of Neural Image Caption Generation Models
Simin Chen
Zihe Song
Mirazul Haque
Cong Liu
Wei Yang
66
42
0
29 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
88
66
0
29 Mar 2022
Quantifying Societal Bias Amplification in Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
76
48
0
29 Mar 2022
End-to-End Transformer Based Model for Image Captioning
Yiyu Wang
Jungang Xu
Yingfei Sun
VLM
ViT
64
125
0
29 Mar 2022
NOC-REK: Novel Object Captioning with Retrieved Vocabulary from External Knowledge
D. Vo
Hong Chen
Akihiro Sugimoto
Hideki Nakayama
132
14
0
28 Mar 2022
A General Survey on Attention Mechanisms in Deep Learning
Gianni Brauwers
Flavius Frasincar
104
334
0
27 Mar 2022
Towards Escaping from Language Bias and OCR Error: Semantics-Centered Text Visual Question Answering
Chengyang Fang
Gangyan Zeng
Yu Zhou
Daiqing Wu
Can Ma
Dayong Hu
Weiping Wang
58
8
0
24 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
60
4
0
24 Mar 2022
PACS: A Dataset for Physical Audiovisual CommonSense Reasoning
Samuel Yu
Peter Wu
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
LRM
117
16
0
21 Mar 2022
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answering
Yang Ding
Jing Yu
Bangchang Liu
Yue Hu
Mingxin Cui
Qi Wu
58
64
0
17 Mar 2022
UNIMO-2: End-to-End Unified Vision-Language Grounded Learning
Wei Li
Can Gao
Guocheng Niu
Xinyan Xiao
Hao Liu
Jiachen Liu
Hua Wu
Haifeng Wang
MLLM
51
22
0
17 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
107
54
0
16 Mar 2022
Spot the Difference: A Cooperative Object-Referring Game in Non-Perfectly Co-Observable Scene
Duo Zheng
Fandong Meng
Q. Si
Hairun Fan
Zipeng Xu
Jie Zhou
Fangxiang Feng
Xiaojie Wang
73
0
0
16 Mar 2022
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara
Tatsuya Harada
98
10
0
15 Mar 2022
Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval
Guanyu Cai
Yixiao Ge
Binjie Zhang
Alex Jinpeng Wang
Rui Yan
...
Ying Shan
Lianghua He
Xiaohu Qie
Jianping Wu
Mike Zheng Shou
VLM
51
6
0
15 Mar 2022
Can you even tell left from right? Presenting a new challenge for VQA
Sairaam Venkatraman
Rishi Rao
S. Balasubramanian
C. Vorugunti
R. R. Sarma
CoGe
81
0
0
15 Mar 2022
Extracting associations and meanings of objects depicted in artworks through bi-modal deep networks
Gregory Kell
Ryan-Rhys Griffiths
Anthony Bourached
D. Stork
42
3
0
14 Mar 2022
Pruned Graph Neural Network for Short Story Ordering
Melika Golestani
Zeinab Borhanifard
Farnaz Tahmasebian
Heshaam Faili
41
0
0
13 Mar 2022
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
79
1
0
13 Mar 2022
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Wenliang Dai
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
Pascale Fung
VLM
92
94
0
12 Mar 2022
REX: Reasoning-aware and Grounded Explanation
Shi Chen
Qi Zhao
89
18
0
11 Mar 2022
Two-stream Hierarchical Similarity Reasoning for Image-text Matching
Ran Chen
Hanli Wang
Lei Wang
Sam Kwong
57
9
0
10 Mar 2022
Knowledge-enriched Attention Network with Group-wise Semantic for Visual Storytelling
Tengpeng Li
Hanli Wang
Bin He
Changan Chen
DiffM
88
10
0
10 Mar 2022
MORE: Multi-Order RElation Mining for Dense Captioning in 3D Scenes
Yang Jiao
Shaoxiang Chen
Zequn Jie
Wenke Huang
Lin Ma
Yu-Gang Jiang
3DPC
115
48
0
10 Mar 2022
NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
Fawaz Sammani
Tanmoy Mukherjee
Nikos Deligiannis
MILM
ELM
LRM
138
68
0
09 Mar 2022
Towards Inadequately Pre-trained Models in Transfer Learning
Andong Deng
Xingjian Li
Di Hu
Tianyang Wang
Haoyi Xiong
Chengzhong Xu
23
6
0
09 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
74
29
0
08 Mar 2022
Where Does the Performance Improvement Come From? -- A Reproducibility Concern about Image-Text Retrieval
Jun Rao
Fei Wang
Liang Ding
Shuhan Qi
Yibing Zhan
Weifeng Liu
Dacheng Tao
OOD
89
30
0
08 Mar 2022
GlideNet: Global, Local and Intrinsic based Dense Embedding NETwork for Multi-category Attributes Prediction
Kareem M. Metwaly
Aerin Kim
E. Branson
V. Monga
86
7
0
07 Mar 2022
Modeling Coreference Relations in Visual Dialog
Mingxiao Li
Marie-Francine Moens
51
10
0
06 Mar 2022
Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering
Mingxiao Li
Marie-Francine Moens
82
13
0
06 Mar 2022
Important Object Identification with Semi-Supervised Learning for Autonomous Driving
Jiachen Li
Haiming Gang
Hengbo Ma
Masayoshi Tomizuka
Chiho Choi
93
12
0
05 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
79
37
0
03 Mar 2022
A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism
Rashid Khan
Shujah Islam
Khadija Kanwal
Mansoor Iqbal
Md. Imran Hossain
Z. Ye
3DV
32
18
0
03 Mar 2022
Video Question Answering: Datasets, Algorithms and Challenges
Yaoyao Zhong
Junbin Xiao
Wei Ji
Yicong Li
Wei Deng
Tat-Seng Chua
124
92
0
02 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
82
31
0
01 Mar 2022
Interactive Machine Learning for Image Captioning
Mareike Hartmann
Aliki Anagnostopoulou
Daniel Sonntag
VLM
45
4
0
28 Feb 2022
SGL: Symbolic Goal Learning in a Hybrid, Modular Framework for Human Instruction Following
Ruinian Xu
Hongyi Chen
Yunzhi Lin
Patricio A. Vela
66
6
0
25 Feb 2022
On Modality Bias Recognition and Reduction
Yangyang Guo
Liqiang Nie
Harry Cheng
Zhiyong Cheng
Mohan S. Kankanhalli
A. Bimbo
75
28
0
25 Feb 2022
Joint Answering and Explanation for Visual Commonsense Reasoning
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Yin-wei Wei
Liqiang Nie
Mohan S. Kankanhalli
67
17
0
25 Feb 2022
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
LM&Ro
92
149
0
23 Feb 2022
Relation Regularized Scene Graph Generation
Yuyu Guo
Lianli Gao
Jingkuan Song
Peng Wang
N. Sebe
Heng Tao Shen
Xuelong Li
68
15
0
22 Feb 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
78
30
0
21 Feb 2022
(2.5+1)D Spatio-Temporal Scene Graphs for Video Question Answering
A. Cherian
Chiori Hori
Tim K. Marks
Jonathan Le Roux
108
38
0
18 Feb 2022
A Review on Methods and Applications in Multimodal Deep Learning
Summaira Jabeen
Xi Li
Muhammad Shoib Amin
Abdul Jabbar
VLM
HAI
73
101
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
181
227
0
18 Feb 2022
When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs
Oana Ignat
Santiago Castro
Yuhang Zhou
Jiajun Bao
Dandan Shan
Rada Mihalcea
55
3
0
16 Feb 2022
XFBoost: Improving Text Generation with Controllable Decoders
Xiangyu Peng
Michael Sollami
70
1
0
16 Feb 2022
Previous
1
2
3
...
15
16
17
...
36
37
38
Next