ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan
Chengjun Xu
Qiwei Chen
Tao Zhuang
Hongjie Chen
Chong Li
Junfeng Ge
AI4TS
64
0
0
19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual
  Perspective
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
61
4
0
18 Oct 2022
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
CVBM
54
4
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
240
3,522
0
16 Oct 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention
  Networks for Semi-supervised Learning
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning
Tiantian He
Haicang Zhou
Yew-Soon Ong
Gao Cong
GNN
135
4
0
14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
62
1
0
13 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual
  Representation Learning
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
117
149
0
12 Oct 2022
APSNet: Attention Based Point Cloud Sampling
APSNet: Attention Based Point Cloud Sampling
Yang Ye
Xiulong Yang
Shihao Ji
3DPC
63
7
0
11 Oct 2022
Like a bilingual baby: The advantage of visually grounding a bilingual
  language model
Like a bilingual baby: The advantage of visually grounding a bilingual language model
Khai-Nguyen Nguyen
Zixin Tang
A. Mali
Mary Alexandria Kelly
VLM
45
0
0
11 Oct 2022
Generating image captions with external encyclopedic knowledge
Generating image captions with external encyclopedic knowledge
S. Nikiforova
Tejaswini Deoskar
Denis Paperno
Yoad Winter
72
2
0
10 Oct 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLMDiffM
90
14
0
10 Oct 2022
Fine-grained Anomaly Detection in Sequential Data via Counterfactual
  Explanations
Fine-grained Anomaly Detection in Sequential Data via Counterfactual Explanations
He Cheng
Depeng Xu
Shuhan Yuan
Xintao Wu
AI4TS
59
3
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering
  via Decoupling Spatial-Temporal Modeling
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
95
2
0
08 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
89
15
0
08 Oct 2022
LOCL: Learning Object-Attribute Composition using Localization
LOCL: Learning Object-Attribute Composition using Localization
Satish Kumar
A S M Iftekhar
Ekta Prashnani
B.S.Manjunath
96
3
0
07 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake
  Detectors
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors
Federico Baldassarre
Quentin Debard
Gonzalo Fiz Pontiveros
Tri Kurniawan Wijaya
82
4
0
07 Oct 2022
CLEAR: Causal Explanations from Attention in Neural Recommenders
CLEAR: Causal Explanations from Attention in Neural Recommenders
Shami Nisimov
R. Y. Rohekar
Yaniv Gurwicz
G. Koren
Gal Novik
CML
38
6
0
07 Oct 2022
AOE-Net: Entities Interactions Modeling with Adaptive Attention
  Mechanism for Temporal Action Proposals Generation
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
Khoa T. Vo
Sang Truong
Kashu Yamazaki
Bhiksha Raj
Minh-Triet Tran
Ngan Le
158
30
0
05 Oct 2022
Improved Anomaly Detection by Using the Attention-Based Isolation Forest
Improved Anomaly Detection by Using the Attention-Based Isolation Forest
Lev V. Utkin
A. Ageev
A. Konstantinov
82
8
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
119
19
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
109
12
0
04 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
87
10
0
04 Oct 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music
  Recordings
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu
71
5
0
02 Oct 2022
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
Saeid Asgari Taghanaki
Aliasghar Khani
Fereshte Khani
A. Gholami
Linh-Tam Tran
Ali Mahdavi-Amiri
Ghassan Hamarneh
AAML
100
48
0
30 Sep 2022
Multimodality Multi-Lead ECG Arrhythmia Classification using
  Self-Supervised Learning
Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning
Thi-Thu-Hong Phan
Duc Le
Brijesh Patel
Donald Adjeroh
Jingxian Wu
M. Jensen
Ngan Le
77
12
0
30 Sep 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval
  Augmentation
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
VLM
92
89
0
30 Sep 2022
Medical Image Captioning via Generative Pretrained Transformers
Medical Image Captioning via Generative Pretrained Transformers
Alexander Selivanov
Oleg Y. Rogov
Daniil Chesakov
Artem Shelmanov
Irina Fedulova
Dmitry V. Dylov
MedIm
102
64
0
28 Sep 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in
  Mobile-Centric Inference
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
60
2
0
28 Sep 2022
RepsNet: Combining Vision with Language for Automated Medical Reports
RepsNet: Combining Vision with Language for Automated Medical Reports
A. Tanwani
Joelle Barral
Daniel Freedman
MedIm
93
23
0
27 Sep 2022
STING: Self-attention based Time-series Imputation Networks using GAN
STING: Self-attention based Time-series Imputation Networks using GAN
Eunkyu Oh
Taehun Kim
Yunhu Ji
Sushil Khyalia
AI4TS
92
25
0
22 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
183
100
0
22 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning
  in Wikipedia
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
70
11
0
21 Sep 2022
Active Particle Filter Networks: Efficient Active Localization in
  Continuous Action Spaces and Large Maps
Active Particle Filter Networks: Efficient Active Localization in Continuous Action Spaces and Large Maps
Daniel Honerkamp
Suresh Guttikonda
Abhinav Valada
71
2
0
20 Sep 2022
Accelerating Neural Network Inference with Processing-in-DRAM: From the
  Edge to the Cloud
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud
Geraldo F. Oliveira
Juan Gómez Luna
Saugata Ghose
Amirali Boroumand
O. Mutlu
75
26
0
19 Sep 2022
Learning Distinct and Representative Styles for Image Captioning
Learning Distinct and Representative Styles for Image Captioning
Qi Chen
Chaorui Deng
Qi Wu
VLM
79
24
0
17 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Belief Revision based Caption Re-ranker with Visual Semantic Information
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
74
2
0
16 Sep 2022
M^4I: Multi-modal Models Membership Inference
M^4I: Multi-modal Models Membership Inference
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
99
27
0
15 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic
  Speech Recognition
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
62
3
0
13 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
89
45
0
13 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a
  natural language
Evaluation of Question Answering Systems: Complexity of judging a natural language
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
62
3
0
10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
114
90
0
07 Sep 2022
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World
  WiFi and Bluetooth
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth
Anu Jagannath
Zackary Kane
Jithin Jagannath
76
11
0
07 Sep 2022
Parallel and Streaming Wavelet Neural Networks for Classification and
  Regression under Apache Spark
Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark
E Venkatesh
Yelleti Vivek
V. Ravi
Shiva Shankar Orsu
63
6
0
07 Sep 2022
A Weakly Supervised Learning Framework for Salient Object Detection via
  Hybrid Labels
A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels
Runmin Cong
Qi Qin
Chen Zhang
Qiuping Jiang
Shi Wang
Yao-Min Zhao
Sam Kwong
122
54
0
07 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A
  Sequence-to-Sequence Framework for Thematic Music Comments Generation
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation
Peining Zhang
Junliang Guo
Linli Xu
Mu You
Junming Yin
55
0
0
05 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
82
24
0
03 Sep 2022
vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain
  using Swin Transformer and Attention-based LSTM
vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM
THANH VAN NGUYEN
Long H. Nguyen
Nhat Truong Pham
Liu Tai Nguyen
Van Huong Do
Hai Nguyen
Ngoc Duy Nguyen
VLMViT
50
1
0
03 Sep 2022
EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning
EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning
R. Gupta
Shivani Nandgaonkar
Nikhil Cherian Kurian
S. Rane
A. Sethi
MedIm
56
8
0
26 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted
  Window
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go
Hideyuki Tachibana
ViT
68
9
0
24 Aug 2022
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and
  Representation Mapping
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping
Bo Zhou
Jiahui Liu
Songyi Cui
Yaping Zhao
45
5
0
23 Aug 2022
Previous
123...111213...697071
Next