Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,510 papers shown
Title
Sparse Attentive Memory Network for Click-through Rate Prediction with Long Sequences
Qianying Lin
Wen-Ji Zhou
Yanshi Wang
Qing Da
Qingguo Chen
Bing Wang
VLM
23
9
0
08 Aug 2022
Fine-Grained Semantically Aligned Vision-Language Pre-Training
Juncheng Li
Xin He
Longhui Wei
Long Qian
Linchao Zhu
Lingxi Xie
Yueting Zhuang
Qi Tian
Siliang Tang
VLM
41
79
0
04 Aug 2022
Benchmarking Visual-Inertial Deep Multimodal Fusion for Relative Pose Regression and Odometry-aided Absolute Pose Regression
Felix Ott
N. Raichur
David Rügamer
Tobias Feigl
Heiko Neumann
Bernd Bischl
Christopher Mutschler
38
1
0
01 Aug 2022
Object-ABN: Learning to Generate Sharp Attention Maps for Action Recognition
Tomoya Nitta
Tsubasa Hirakawa
H. Fujiyoshi
Toru Tamaki
58
0
0
27 Jul 2022
Multi-Attention Network for Compressed Video Referring Object Segmentation
Weidong Chen
Dexiang Hong
Yuankai Qi
Zhenjun Han
Shuhui Wang
Laiyun Qing
Qingming Huang
Guorong Li
VOS
20
36
0
26 Jul 2022
Innovations in Neural Data-to-text Generation: A Survey
Mandar Sharma
Ajay K. Gogineni
Naren Ramakrishnan
34
10
0
25 Jul 2022
Improved Super Resolution of MR Images Using CNNs and Vision Transformers
Dwarikanath Mahapatra
SupR
ViT
MedIm
29
5
0
24 Jul 2022
When Counting Meets HMER: Counting-Aware Network for Handwritten Mathematical Expression Recognition
Bohan Li
Ye Yuan
Dingkang Liang
Xiao-Chang Liu
Zhilong Ji
Jinfeng Bai
Wenyu Liu
Xiang Bai
39
47
0
23 Jul 2022
Rethinking the Reference-based Distinctive Image Captioning
Yangjun Mao
Long Chen
Zhihong Jiang
Dong Zhang
Zhimeng Zhang
Jian Shao
Jun Xiao
DiffM
30
22
0
22 Jul 2022
Zero-Shot Video Captioning with Evolving Pseudo-Tokens
Yoad Tewel
Yoav Shalev
Roy Nadler
Idan Schwartz
Lior Wolf
37
27
0
22 Jul 2022
Efficient Modeling of Future Context for Image Captioning
Zhengcong Fei
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
42
14
0
22 Jul 2022
EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Chenyu Yang
W. He
Yingqing Xu
Yang Gao
DiffM
24
26
0
20 Jul 2022
GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
Van-Quang Nguyen
Masanori Suganuma
Takayuki Okatani
ViT
36
106
0
20 Jul 2022
Explicit Image Caption Editing
Zhen Wang
Long Chen
Wenbo Ma
G. Han
Yulei Niu
Jian Shao
Jun Xiao
25
12
0
20 Jul 2022
Deep Analysis of Visual Product Reviews
Chandranath Adak
S. Chattopadhyay
Muhammad Saqib
19
2
0
19 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Motonari Kambara
K. Sugiura
32
6
0
19 Jul 2022
Superficial White Matter Analysis: An Efficient Point-cloud-based Deep Learning Framework with Supervised Contrastive Learning for Consistent Tractography Parcellation across Populations and dMRI Acquisitions
Tengfei Xue
Fan Zhang
Chaoyi Zhang
Yuqian Chen
Yang Song
A. Golby
N. Makris
Yogesh Rathi
Weidong (Tom) Cai
L. O’Donnell
18
32
0
18 Jul 2022
Zero-Shot Temporal Action Detection via Vision-Language Prompting
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
VLM
33
65
0
17 Jul 2022
Eliminating Gradient Conflict in Reference-based Line-Art Colorization
Zekun Li
Zhengyang Geng
Zhao Kang
Wenyu Chen
Yibo Yang
21
35
0
13 Jul 2022
Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional Network
Hao Xing
Darius Burschka
GNN
3DH
27
7
0
12 Jul 2022
Trusted Multi-Scale Classification Framework for Whole Slide Image
Ming Feng
Kele Xu
Na Wu
Weiquan Huang
Yan Bai
Changjian Wang
Huaimin Wang
45
5
0
12 Jul 2022
CoMER: Modeling Coverage for Transformer-based Handwritten Mathematical Expression Recognition
Wenqi Zhao
Liang Gao
ViT
19
27
0
10 Jul 2022
Horizontal and Vertical Attention in Transformers
Litao Yu
Jing Zhang
ViT
25
1
0
10 Jul 2022
Towards Multimodal Vision-Language Models Generating Non-Generic Text
Wes Robbins
Zanyar Zohourianshahzadi
Jugal Kalita
14
1
0
09 Jul 2022
Seasonal Encoder-Decoder Architecture for Forecasting
Avinash Achar
Soumen Pachal
BDL
AI4TS
19
0
0
08 Jul 2022
Exploring the sequence length bottleneck in the Transformer for Image Captioning
Jiapeng Hu
Roberto Cavicchioli
Alessandro Capotondi
ViT
38
3
0
07 Jul 2022
Scene-Aware Prompt for Multi-modal Dialogue Understanding and Generation
Bin Li
Yixuan Weng
Ziyu Ma
Bin Sun
Shutao Li
VLM
15
2
0
05 Jul 2022
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
A. Luu
VLM
CLIP
27
2
0
05 Jul 2022
Are metrics measuring what they should? An evaluation of image captioning task metrics
Othón González-Chávez
Guillermo Ruiz
Daniela Moctezuma
Tania A. Ramirez-delreal
21
9
0
04 Jul 2022
TM2T: Stochastic and Tokenized Modeling for the Reciprocal Generation of 3D Human Motions and Texts
Chuan Guo
Xinxin Xuo
Sen Wang
Li Cheng
VGen
87
230
0
04 Jul 2022
Multi-scale alignment and Spatial ROI Module for COVID-19 Diagnosis
Hongyan Xu
Dadong Wang
Arcot Sowmya
21
1
0
04 Jul 2022
Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation
Sixing Yan
William K. Cheung
Keith W H Chiu
Terence M. Tong
Charles K. Cheung
Simon See
MedIm
33
14
0
04 Jul 2022
PhilaeX: Explaining the Failure and Success of AI Models in Malware Detection
Zhi Lu
V. Thing
AAML
11
5
0
02 Jul 2022
Rethinking Query-Key Pairwise Interactions in Vision Transformers
Cheng-rong Li
Yangxin Liu
39
0
0
01 Jul 2022
Personalized Showcases: Generating Multi-Modal Explanations for Recommendations
An Yan
Zhankui He
Jiacheng Li
Tianyang Zhang
Julian McAuley
33
36
0
30 Jun 2022
Overview of Deep Learning-based CSI Feedback in Massive MIMO Systems
Jiajia Guo
Chao-Kai Wen
Shi Jin
Geoffrey Ye Li
42
146
0
29 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
36
0
0
28 Jun 2022
Consistency-preserving Visual Question Answering in Medical Imaging
Sergio Tascon-Morales
Pablo Márquez-Neila
Raphael Sznitman
MedIm
27
12
0
27 Jun 2022
Theoretical analysis of Adam using hyperparameters close to one without Lipschitz smoothness
Hideaki Iiduka
22
5
0
27 Jun 2022
Representative Teacher Keys for Knowledge Distillation Model Compression Based on Attention Mechanism for Image Classification
Jun-Teng Yang
Sheng-Che Kao
S. Huang
10
0
0
26 Jun 2022
Competence-based Multimodal Curriculum Learning for Medical Report Generation
Fenglin Liu
Shen Ge
Yuexian Zou
Xian Wu
MedIm
35
132
0
24 Jun 2022
CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal Pose
Xu Zhang
Wen Wang
Zhe Chen
Yufei Xu
Jing Zhang
Dacheng Tao
CLIP
VLM
24
25
0
23 Jun 2022
Bypass Network for Semantics Driven Image Paragraph Captioning
Qinjie Zheng
Chaoyue Wang
Dadong Wang
32
1
0
21 Jun 2022
A Self-Guided Framework for Radiology Report Generation
Jun Li
Shibo Li
Ying Hu
Huiren Tao
MedIm
27
21
0
19 Jun 2022
What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs
Tal Shaharabany
Yoad Tewel
Lior Wolf
ObjD
38
16
0
19 Jun 2022
Cross-task Attention Mechanism for Dense Multi-task Learning
Ivan Lopes
Tuan-Hung Vu
Raoul de Charette
22
26
0
17 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
22
3
0
16 Jun 2022
Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises
Minkyu Choi
Yizhen Zhang
Kuan Han
Xiaokai Wang
Zhongming Liu
AAML
GAN
41
4
0
15 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
26
88
0
14 Jun 2022
Challenges in Applying Explainability Methods to Improve the Fairness of NLP Models
Esma Balkir
S. Kiritchenko
I. Nejadgholi
Kathleen C. Fraser
21
36
0
08 Jun 2022
Previous
1
2
3
...
12
13
14
...
69
70
71
Next