Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,520 papers shown
Title
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
111
28
0
12 Dec 2022
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Kevin Hyekang Joo
Khoa T. Vo
Kashu Yamazaki
Ngan Le
71
51
0
09 Dec 2022
SLAM for Visually Impaired People: a Survey
Banafshe Marziyeh Bamdad
Davide Scaramuzza
Alireza Darvishy
60
8
0
09 Dec 2022
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
85
28
0
08 Dec 2022
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification
Alan Q. Wang
M. Sabuncu
74
5
0
07 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
63
9
0
06 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
94
74
0
06 Dec 2022
Document-Level Abstractive Summarization
Gonçalo Raposo
Afonso Raposo
Ana Sofia Carmo
48
2
0
06 Dec 2022
Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
En Yu
Songtao Liu
Zhuoling Li
Jinrong Yang
Zeming Li
Shoudong Han
Wenbing Tao
110
13
0
03 Dec 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
75
13
0
01 Dec 2022
Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics
Saurabh Deshpande
Raúl I. Sosa
Stéphane P. A. Bordas
J. Lengiewicz
AI4CE
79
20
0
01 Dec 2022
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
58
0
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
69
13
0
30 Nov 2022
Progressive Knowledge Transfer Based on Human Visual Perception Mechanism for Perceptual Quality Assessment of Point Clouds
Qi Liu
Yiyun Liu
Honglei Su
Hui Yuan
R. Hamzaoui
55
11
0
30 Nov 2022
An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks
Yanhong Li
Jack L. Xu
D. Anastasiu
AI4TS
31
14
0
29 Nov 2022
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang
Wen-gang Zhou
Jianmin Bao
Weilun Wang
Li Li
Houqiang Li
GAN
CLIP
64
6
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM
3DV
60
4
0
27 Nov 2022
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information
Binoy Saha
Sukhendu Das
123
0
0
27 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
Xinze Wang
VLM
CLIP
CoGe
110
20
0
25 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
95
25
0
22 Nov 2022
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
102
1
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
102
24
0
21 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
70
4
0
21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
69
1
0
20 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
97
15
0
19 Nov 2022
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViT
MedIm
96
36
0
18 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
63
30
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
121
11
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
67
44
0
16 Nov 2022
SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery
Hao Qu
Lilian Zhang
Xiaoping Hu
Xiaofeng He
Xianfei Pan
Changhao Chen
MDE
58
4
0
16 Nov 2022
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation
Xinyao Shu
Shiyang Yan
Zhenyu Lu
Xinshao Wang
Yuan Xie
77
2
0
16 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
59
48
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
64
9
0
14 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
81
21
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
50
1
0
10 Nov 2022
Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
V. Sharma
John P. Dickerson
Pratap Tokekar
AI4CE
52
0
0
09 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
82
12
0
06 Nov 2022
On learning history based policies for controlling Markov decision processes
Gandharv Patil
Aditya Mahajan
Doina Precup
OffRL
94
5
0
06 Nov 2022
Fair Visual Recognition via Intervention with Proxy Features
Yi Zhang
Jitao Sang
Junyan Wang
74
1
0
02 Nov 2022
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLM
AILaw
67
29
0
02 Nov 2022
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
34
6
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
108
13
0
28 Oct 2022
A Generic Shared Attention Mechanism for Various Backbone Neural Networks
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Liang Lin
111
6
0
27 Oct 2022
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Antonio Longa
Steve Azzolin
G. Santin
G. Cencetti
Pietro Lio
Bruno Lepri
Andrea Passerini
113
31
0
27 Oct 2022
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
90
25
0
27 Oct 2022
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Colin Leong
Joshua Nemecek
Jacob Mansdorfer
Anna Filighera
A. Owodunni
Daniel Whitenack
VLM
AI4CE
169
29
0
26 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J. Liang
Tal Hassner
James J. Clark
94
4
0
24 Oct 2022
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
A. S. Bansal
Yoonjin Lee
Kyle Hilburn
I. Ebert‐Uphoff
AI4TS
96
2
0
22 Oct 2022
Describing Sets of Images with Textual-PCA
Oded Hupert
Idan Schwartz
Lior Wolf
CoGe
60
1
0
21 Oct 2022
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
114
48
0
19 Oct 2022
Previous
1
2
3
...
10
11
12
...
69
70
71
Next