ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved
  Visio-Linguistic Models in 3D Scenes
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
111
28
0
12 Dec 2022
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised
  Video Anomaly Detection
CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Kevin Hyekang Joo
Khoa T. Vo
Kashu Yamazaki
Ngan Le
71
51
0
09 Dec 2022
SLAM for Visually Impaired People: a Survey
SLAM for Visually Impaired People: a Survey
Banafshe Marziyeh Bamdad
Davide Scaramuzza
Alireza Darvishy
60
8
0
09 Dec 2022
Modularity through Attention: Efficient Training and Transfer of
  Language-Conditioned Policies for Robot Manipulation
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
85
28
0
08 Dec 2022
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated
  Classification
A Flexible Nadaraya-Watson Head Can Offer Explainable and Calibrated Classification
Alan Q. Wang
M. Sabuncu
74
5
0
07 Dec 2022
Switching to Discriminative Image Captioning by Relieving a Bottleneck
  of Reinforcement Learning
Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning
Ukyo Honda
Taro Watanabe
Yuji Matsumoto
63
9
0
06 Dec 2022
Semantic-Conditional Diffusion Networks for Image Captioning
Semantic-Conditional Diffusion Networks for Image Captioning
Jianjie Luo
Yehao Li
Yingwei Pan
Ting Yao
Jianlin Feng
Hongyang Chao
Tao Mei
DiffM
94
74
0
06 Dec 2022
Document-Level Abstractive Summarization
Document-Level Abstractive Summarization
Gonçalo Raposo
Afonso Raposo
Ana Sofia Carmo
48
2
0
06 Dec 2022
Generalizing Multiple Object Tracking to Unseen Domains by Introducing
  Natural Language Representation
Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
En Yu
Songtao Liu
Zhuoling Li
Jinrong Yang
Zeming Li
Shoudong Han
Wenbing Tao
110
13
0
03 Dec 2022
Focus! Relevant and Sufficient Context Selection for News Image
  Captioning
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
75
13
0
01 Dec 2022
Convolution, aggregation and attention based deep neural networks for
  accelerating simulations in mechanics
Convolution, aggregation and attention based deep neural networks for accelerating simulations in mechanics
Saurabh Deshpande
Raúl I. Sosa
Stéphane P. A. Bordas
J. Lengiewicz
AI4CE
79
20
0
01 Dec 2022
Multilingual Communication System with Deaf Individuals Utilizing
  Natural and Visual Languages
Multilingual Communication System with Deaf Individuals Utilizing Natural and Visual Languages
Tuan-Luc Huynh
Khoi-Nguyen Nguyen-Ngoc
Chi-Bien Chu
Minh-Triet Tran
Trung-Nghia Le
SLR
58
0
0
01 Dec 2022
Uncertainty-Aware Image Captioning
Uncertainty-Aware Image Captioning
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
UQLM
69
13
0
30 Nov 2022
Progressive Knowledge Transfer Based on Human Visual Perception
  Mechanism for Perceptual Quality Assessment of Point Clouds
Progressive Knowledge Transfer Based on Human Visual Perception Mechanism for Perceptual Quality Assessment of Point Clouds
Qi Liu
Yiyun Liu
Honglei Su
Hui Yuan
R. Hamzaoui
55
11
0
30 Nov 2022
An Extreme-Adaptive Time Series Prediction Model Based on
  Probability-Enhanced LSTM Neural Networks
An Extreme-Adaptive Time Series Prediction Model Based on Probability-Enhanced LSTM Neural Networks
Yanhong Li
Jack L. Xu
D. Anastasiu
AI4TS
31
14
0
29 Nov 2022
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
CLIP2GAN: Towards Bridging Text with the Latent Space of GANs
Yixuan Wang
Wen-gang Zhou
Jianmin Bao
Weilun Wang
Li Li
Houqiang Li
GANCLIP
64
6
0
28 Nov 2022
CLID: Controlled-Length Image Descriptions with Limited Data
CLID: Controlled-Length Image Descriptions with Limited Data
Elad Hirsch
A. Tal
VLM3DV
60
4
0
27 Nov 2022
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class
  Information
Conditioning Covert Geo-Location (CGL) Detection on Semantic Class Information
Binoy Saha
Sukhendu Das
123
0
0
27 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
Xinze Wang
VLMCLIPCoGe
110
20
0
25 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
95
25
0
22 Nov 2022
A Short Survey of Systematic Generalization
A Short Survey of Systematic Generalization
Yuanpeng Li
AI4CE
102
1
0
22 Nov 2022
Exploring Discrete Diffusion Models for Image Captioning
Exploring Discrete Diffusion Models for Image Captioning
Zixin Zhu
Yixuan Wei
Jianfeng Wang
Zhe Gan
Zheng Zhang
Le Wang
G. Hua
Lijuan Wang
Zicheng Liu
Han Hu
DiffMVLM
102
24
0
21 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLMCLIP
70
4
0
21 Nov 2022
How to Describe Images in a More Funny Way? Towards a Modular Approach
  to Cross-Modal Sarcasm Generation
How to Describe Images in a More Funny Way? Towards a Modular Approach to Cross-Modal Sarcasm Generation
Jie Ruan
Yue Wu
Xiaojun Wan
Yuesheng Zhu
69
1
0
20 Nov 2022
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on
  Diversity over Language and Culture
ArtELingo: A Million Emotion Annotations of WikiArt with Emphasis on Diversity over Language and Culture
Youssef Mohamed
Mohamed AbdelFattah
Shyma Alhuwaider
Feifan Li
Xiangliang Zhang
Kenneth Church
Mohamed Elhoseiny
VLM
97
15
0
19 Nov 2022
Vision Transformers in Medical Imaging: A Review
Vision Transformers in Medical Imaging: A Review
Emerald U. Henry
Onyeka Emebob
C. Omonhinmin
ViTMedIm
96
36
0
18 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
63
30
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal
  Pre-trained Knowledge
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
121
11
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
  Masked Autoencoders
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
67
44
0
16 Nov 2022
SelfOdom: Self-supervised Egomotion and Depth Learning via
  Bi-directional Coarse-to-Fine Scale Recovery
SelfOdom: Self-supervised Egomotion and Depth Learning via Bi-directional Coarse-to-Fine Scale Recovery
Hao Qu
Lilian Zhang
Xiaoping Hu
Xiaofeng He
Xianfei Pan
Changhao Chen
MDE
58
4
0
16 Nov 2022
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced
  Attention for Unsupervised Domain Adaptation
AdaTriplet-RA: Domain Matching via Adaptive Triplet and Reinforced Attention for Unsupervised Domain Adaptation
Xinyao Shu
Shiyang Yan
Zhenyu Lu
Xinshao Wang
Yuan Xie
77
2
0
16 Nov 2022
MapQA: A Dataset for Question Answering on Choropleth Maps
MapQA: A Dataset for Question Answering on Choropleth Maps
Shuaichen Chang
David Palzer
Jialin Li
Eric Fosler-Lussier
N. Xiao
59
48
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space
  Alignment
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
64
9
0
14 Nov 2022
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis
Xian Wu
Shuxin Yang
Zhaopeng Qiu
Shen Ge
Yangtian Yan
Xingwang Wu
Yefeng Zheng
S. Kevin Zhou
Li Xiao
MedIm
81
21
0
12 Nov 2022
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation
  Transformer with Attention on Attention for Vietnamese image captioning
VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning
Nghia Hieu Nguyen
Duong T.D. Vo
Minh-Quan Ha
ViT
50
1
0
10 Nov 2022
Interpretable Deep Reinforcement Learning for Green Security Games with
  Real-Time Information
Interpretable Deep Reinforcement Learning for Green Security Games with Real-Time Information
V. Sharma
John P. Dickerson
Pratap Tokekar
AI4CE
52
0
0
09 Nov 2022
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Bridging Speech and Textual Pre-trained Models with Unsupervised ASR
Jiatong Shi
Chan-Jan Hsu
Ho-Lam Chung
Dongji Gao
Leibny Paola García-Perera
Shinji Watanabe
Ann Lee
Hung-yi Lee
82
12
0
06 Nov 2022
On learning history based policies for controlling Markov decision
  processes
On learning history based policies for controlling Markov decision processes
Gandharv Patil
Aditya Mahajan
Doina Precup
OffRL
94
5
0
06 Nov 2022
Fair Visual Recognition via Intervention with Proxy Features
Fair Visual Recognition via Intervention with Proxy Features
Yi Zhang
Jitao Sang
Junyan Wang
74
1
0
02 Nov 2022
Processing Long Legal Documents with Pre-trained Transformers: Modding
  LegalBERT and Longformer
Processing Long Legal Documents with Pre-trained Transformers: Modding LegalBERT and Longformer
Dimitris Mamakas
Petros Tsotsi
Ion Androutsopoulos
Ilias Chalkidis
VLMAILaw
67
29
0
02 Nov 2022
Revisiting Attention Weights as Explanations from an Information
  Theoretic Perspective
Revisiting Attention Weights as Explanations from an Information Theoretic Perspective
Bingyang Wen
K. P. Subbalakshmi
Fan Yang
FAtt
34
6
0
31 Oct 2022
DiMBERT: Learning Vision-Language Grounded Representations with
  Disentangled Multimodal-Attention
DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention
Fenglin Liu
Xian Wu
Shen Ge
Xuancheng Ren
Wei Fan
Xu Sun
Yuexian Zou
VLM
108
13
0
28 Oct 2022
A Generic Shared Attention Mechanism for Various Backbone Neural
  Networks
A Generic Shared Attention Mechanism for Various Backbone Neural Networks
Zhongzhan Huang
Senwei Liang
Mingfu Liang
Liang Lin
111
6
0
27 Oct 2022
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Explaining the Explainers in Graph Neural Networks: a Comparative Study
Antonio Longa
Steve Azzolin
G. Santin
G. Cencetti
Pietro Lio
Bruno Lepri
Andrea Passerini
113
31
0
27 Oct 2022
Masked Vision-Language Transformer in Fashion
Masked Vision-Language Transformer in Fashion
Ge-Peng Ji
Mingchen Zhuge
D. Gao
Deng-Ping Fan
Daniel Gehrig
Luc Van Gool
90
25
0
27 Oct 2022
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of
  Downstream Tasks
Bloom Library: Multimodal Datasets in 300+ Languages for a Variety of Downstream Tasks
Colin Leong
Joshua Nemecek
Jacob Mansdorfer
Anna Filighera
A. Owodunni
Daniel Whitenack
VLMAI4CE
169
29
0
26 Oct 2022
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online
  Action Prediction
GliTr: Glimpse Transformers with Spatiotemporal Consistency for Online Action Prediction
Samrudhdhi B. Rangrej
Kevin J. Liang
Tal Hassner
James J. Clark
94
4
0
24 Oct 2022
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image
  Sequences: From Feature Engineering to Attention-Based Neural Networks
Tools for Extracting Spatio-Temporal Patterns in Meteorological Image Sequences: From Feature Engineering to Attention-Based Neural Networks
A. S. Bansal
Yoonjin Lee
Kyle Hilburn
I. Ebert‐Uphoff
AI4TS
96
2
0
22 Oct 2022
Describing Sets of Images with Textual-PCA
Describing Sets of Images with Textual-PCA
Oded Hupert
Idan Schwartz
Lior Wolf
CoGe
60
1
0
21 Oct 2022
Prophet Attention: Predicting Attention with Future Attention for Image
  Captioning
Prophet Attention: Predicting Attention with Future Attention for Image Captioning
Fenglin Liu
Xuancheng Ren
Xian Wu
Wei Fan
Yuexian Zou
Xu Sun
114
48
0
19 Oct 2022
Previous
123...101112...697071
Next