Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,509 papers shown
Title
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
15
0
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
18
10
0
30 Nov 2022
Progressive Knowledge Transfer Based on Human Visual Perception Mechanism for Perceptual Quality Assessment of Point Clouds
Qi Liu
Yiyun Liu
Honglei Su
Hui Yuan
R. Hamzaoui
19
9
0
30 Nov 2022
An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks
Yanhong Li
Jack L. Xu
D. Anastasiu
AI4TS
8
13
0
29 Nov 2022
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang
Wen-gang Zhou
Jianmin Bao
Weilun Wang
Li Li
Houqiang Li
GAN
CLIP
33
5
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM
3DV
22
4
0
27 Nov 2022
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information
Binoy Saha
Sukhendu Das
27
0
0
27 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
Qing Guo
VLM
CLIP
CoGe
19
20
0
25 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
19
23
0
22 Nov 2022
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
43
1
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng-Wei Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
31
17
0
21 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
35
3
0
21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
29
1
0
20 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
22
14
0
19 Nov 2022
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViT
MedIm
40
34
0
18 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
30
10
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
25
39
0
16 Nov 2022
SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery
Hao Qu
Lilian Zhang
Xiaoping Hu
Xiaofeng He
Xianfei Pan
Changhao Chen
MDE
27
3
0
16 Nov 2022
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation
Xinyao Shu
Shiyang Yan
Zhenyu Lu
Xinshao Wang
Yuan Xie
22
2
0
16 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
19
40
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
31
9
0
14 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
15
20
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
35
1
0
10 Nov 2022
Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
V. Sharma
John P. Dickerson
Pratap Tokekar
AI4CE
13
0
0
09 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
32
12
0
06 Nov 2022
On learning history based policies for controlling Markov decision processes
Gandharv Patil
Aditya Mahajan
Doina Precup
OffRL
21
5
0
06 Nov 2022
Fair Visual Recognition via Intervention with Proxy Features
Yi Zhang
Jitao Sang
Junyan Wang
23
1
0
02 Nov 2022
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLM
AILaw
34
27
0
02 Nov 2022
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
27
6
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
75
12
0
28 Oct 2022
A Generic Shared Attention Mechanism for Various Backbone Neural Networks
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Liang Lin
39
6
0
27 Oct 2022
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Antonio Longa
Steve Azzolin
G. Santin
G. Cencetti
Pietro Lio
Bruno Lepri
Andrea Passerini
46
28
0
27 Oct 2022
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
21
25
0
27 Oct 2022
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Colin Leong
Joshua Nemecek
Jacob Mansdorfer
Anna Filighera
A. Owodunni
Daniel Whitenack
VLM
AI4CE
51
24
0
26 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
27
3
0
24 Oct 2022
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
A. S. Bansal
Yoonjin Lee
Kyle Hilburn
I. Ebert‐Uphoff
AI4TS
38
2
0
22 Oct 2022
Describing Sets of Images with Textual-PCA
Oded Hupert
Idan Schwartz
Lior Wolf
CoGe
31
1
0
21 Oct 2022
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
24
46
0
19 Oct 2022
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan
Chengjun Xu
Qiwei Chen
Tao Zhuang
Hongjie Chen
Chong Li
Junfeng Ge
AI4TS
25
0
0
19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
30
4
0
18 Oct 2022
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
CVBM
21
4
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
81
3,276
0
16 Oct 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning
Tiantian He
Haicang Zhou
Yew-Soon Ong
Gao Cong
GNN
80
4
0
14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
23
1
0
13 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
31
137
0
12 Oct 2022
APSNet: Attention Based Point Cloud Sampling
Yang Ye
Xiulong Yang
Shihao Ji
3DPC
37
6
0
11 Oct 2022
Like a bilingual baby: The advantage of visually grounding a bilingual language model
Khai-Nguyen Nguyen
Zixin Tang
A. Mali
Mary Alexandria Kelly
VLM
20
0
0
11 Oct 2022
Generating image captions with external encyclopedic knowledge
S. Nikiforova
Tejaswini Deoskar
Denis Paperno
Yoad Winter
30
1
0
10 Oct 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLM
DiffM
32
11
0
10 Oct 2022
Previous
1
2
3
...
10
11
12
...
69
70
71
Next