ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXivPDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,509 papers shown
Title
Multilingual Communication System with Deaf Individuals Utilizing
  Natural and Visual Languages
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
15
0
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
18
10
0
30 Nov 2022
Progressive Knowledge Transfer Based on Human Visual Perception
  Mechanism for Perceptual Quality Assessment of Point Clouds
Progressive Knowledge Transfer Based on Human Visual Perception Mechanism for Perceptual Quality Assessment of Point Clouds
Qi Liu
Yiyun Liu
Honglei Su
Hui Yuan
R. Hamzaoui
19
9
0
30 Nov 2022
An Extreme-Adaptive Time Series Prediction Model Based on
  Probability-Enhanced LSTM Neural Networks
An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks
Yanhong Li
Jack L. Xu
D. Anastasiu
AI4TS
8
13
0
29 Nov 2022
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang
Wen-gang Zhou
Jianmin Bao
Weilun Wang
Li Li
Houqiang Li
GAN
CLIP
33
5
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM
3DV
22
4
0
27 Nov 2022
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class
  Information
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information
Binoy Saha
Sukhendu Das
27
0
0
27 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
Qing Guo
VLM
CLIP
CoGe
19
20
0
25 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
19
23
0
22 Nov 2022
A Short Survey of Systematic Generalization
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
43
1
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng-Wei Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffM
VLM
31
17
0
21 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
35
3
0
21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach
  to Cross-Modal Sarcasm Generation
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
29
1
0
20 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on
  Diversity over Language and Culture
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
22
14
0
19 Nov 2022
Vision Transformers in Medical Imaging: A Review
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViT
MedIm
40
34
0
18 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
24
27
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal
  Pre-trained Knowledge
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
30
10
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
  Masked Autoencoders
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
25
39
0
16 Nov 2022
SelfOdom: Self-supervised Egomotion and Depth Learning via
  Bi-directional Coarse-to-Fine Scale Recovery
SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery
Hao Qu
Lilian Zhang
Xiaoping Hu
Xiaofeng He
Xianfei Pan
Changhao Chen
MDE
27
3
0
16 Nov 2022
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced
  Attention for Unsupervised Domain Adaptation
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation
Xinyao Shu
Shiyang Yan
Zhenyu Lu
Xinshao Wang
Yuan Xie
22
2
0
16 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
19
40
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space
  Alignment
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
31
9
0
14 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
15
20
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation
  Transformer with Attention on Attention for Vietnamese image captioning
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
35
1
0
10 Nov 2022
Interpretable Deep Reinforcement Learning for Green Security Games with
  Real-Time Information
Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
V. Sharma
John P. Dickerson
Pratap Tokekar
AI4CE
13
0
0
09 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
32
12
0
06 Nov 2022
On learning history based policies for controlling Markov decision
  processes
On learning history based policies for controlling Markov decision processes
Gandharv Patil
Aditya Mahajan
Doina Precup
OffRL
21
5
0
06 Nov 2022
Fair Visual Recognition via Intervention with Proxy Features
Fair Visual Recognition via Intervention with Proxy Features
Yi Zhang
Jitao Sang
Junyan Wang
23
1
0
02 Nov 2022
Processing Long Legal Documents with Pre-trained Transformers: Modding
  LegalBERT and Longformer
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLM
AILaw
34
27
0
02 Nov 2022
Revisiting Attention Weights as Explanations from an Information
  Theoretic Perspective
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
27
6
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
75
12
0
28 Oct 2022
A Generic Shared Attention Mechanism for Various Backbone Neural
  Networks
A Generic Shared Attention Mechanism for Various Backbone Neural Networks
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Liang Lin
39
6
0
27 Oct 2022
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Antonio Longa
Steve Azzolin
G. Santin
G. Cencetti
Pietro Lio
Bruno Lepri
Andrea Passerini
46
28
0
27 Oct 2022
Masked Vision-Language Transformer in Fashion
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
21
25
0
27 Oct 2022
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of
  Downstream Tasks
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Colin Leong
Joshua Nemecek
Jacob Mansdorfer
Anna Filighera
A. Owodunni
Daniel Whitenack
VLM
AI4CE
51
24
0
26 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J Liang
Tal Hassner
James J. Clark
27
3
0
24 Oct 2022
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image
  Sequences: From Feature Engineering to Attention-Based Neural Networks
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
A. S. Bansal
Yoonjin Lee
Kyle Hilburn
I. Ebert‐Uphoff
AI4TS
38
2
0
22 Oct 2022
Describing Sets of Images with Textual-PCA
Describing Sets of Images with Textual-PCA
Oded Hupert
Idan Schwartz
Lior Wolf
CoGe
31
1
0
21 Oct 2022
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
24
46
0
19 Oct 2022
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan
Chengjun Xu
Qiwei Chen
Tao Zhuang
Hongjie Chen
Chong Li
Junfeng Ge
AI4TS
25
0
0
19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual
  Perspective
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
30
4
0
18 Oct 2022
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
CVBM
21
4
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
81
3,276
0
16 Oct 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention
  Networks for Semi-supervised Learning
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning
Tiantian He
Haicang Zhou
Yew-Soon Ong
Gao Cong
GNN
80
4
0
14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
23
1
0
13 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
  Representation Learning
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
31
137
0
12 Oct 2022
APSNet: Attention Based Point Cloud Sampling
APSNet: Attention Based Point Cloud Sampling
Yang Ye
Xiulong Yang
Shihao Ji
3DPC
37
6
0
11 Oct 2022
Like a bilingual baby: The advantage of visually grounding a bilingual
  language model
Like a bilingual baby: The advantage of visually grounding a bilingual language model
Khai-Nguyen Nguyen
Zixin Tang
A. Mali
Mary Alexandria Kelly
VLM
20
0
0
11 Oct 2022
Generating image captions with external encyclopedic knowledge
Generating image captions with external encyclopedic knowledge
S. Nikiforova
Tejaswini Deoskar
Denis Paperno
Yoad Winter
30
1
0
10 Oct 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLM
DiffM
32
11
0
10 Oct 2022
Previous
123...101112...697071
Next