ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
64
2
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of
  Traffic Safety Critical Events
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
90
5
0
19 Jun 2024
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain
  Learning
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning
Xiaowen Ma
Jiawei Yang
Rui Che
Huanting Zhang
Wei Zhang
57
5
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with
  Visual Insights for Retinal Image Medical Description Generation
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
100
0
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better
  Performance
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
96
6
0
15 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A
  Survey
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
133
14
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
71
6
0
09 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and
  Challenges
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
55
3
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image
  Captioning
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
125
3
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in
  Large Multi-modal Models
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
117
16
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via
  Unsupervised Guidance
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
77
6
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
81
1
0
01 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in
  Multimodal Large Language Models
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
76
34
0
31 May 2024
CoSy: Evaluating Textual Explanations of Neurons
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
70
13
0
30 May 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge
  Bases
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
86
8
0
30 May 2024
SIG: Efficient Self-Interpretable Graph Neural Network for
  Continuous-time Dynamic Graphs
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Lanting Fang
Yulian Yang
Kai Wang
Shanshan Feng
Kaiyu Feng
Jie Gui
Shuliang Wang
Yew-Soon Ong
104
1
0
29 May 2024
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Xuan-Bac Nguyen
Hojin Jang
Xin Li
Samee U. Khan
Pawan Sinha
Khoa Luu
106
3
0
29 May 2024
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for
  Whole Slide Image Analysis
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Quan Liu
Ruining Deng
Can Cui
Tianyuan Yao
V. Nath
Yucheng Tang
Yuankai Huo
82
0
0
28 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical
  Study of VCR
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
54
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias
  Towards Vision-Language Tasks
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
128
0
0
27 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for
  Multimodal Large Language Models
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
96
3
0
24 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment
  Capability
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
83
7
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
84
12
0
21 May 2024
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision
  and Text
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia
Qing Zhou
Wei Huang
Junyu Gao
Qi. Wang
VLM
87
1
0
21 May 2024
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with
  Attention Mechanism and SHAP
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with Attention Mechanism and SHAP
Qiqi Su
Eleftheria Iliadou
48
1
0
18 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Automated Radiology Report Generation: A Review of Recent Advances
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
90
21
0
17 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on
  Discriminative Features
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features
Yao Rong
David Scheerer
Enkelejda Kasneci
82
0
0
16 May 2024
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang
Lihe Zhang
Jiayu Sun
Huchuan Lu
96
0
0
15 May 2024
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Nick Nikzad
Yongsheng Gao
Jun Zhou
66
1
0
09 May 2024
Temporal and Heterogeneous Graph Neural Network for Remaining Useful
  Life Prediction
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Zhihao Wen
Yuan Fang
Pengcheng Wei
Fayao Liu
Zhenghua Chen
Min-man Wu
AI4CE
75
2
0
07 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
102
16
0
05 May 2024
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
54
0
0
05 May 2024
Explainable Interface for Human-Autonomy Teaming: A Survey
Explainable Interface for Human-Autonomy Teaming: A Survey
Xiangqi Kong
Yang Xing
Antonios Tsourdos
Ziyue Wang
Weisi Guo
Adolfo Perrusquía
Andreas Wikander
76
4
0
04 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
72
1
0
02 May 2024
Semi-supervised Text-based Person Search
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
90
2
0
28 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Pre-training on High Definition X-ray Images: An Experimental Study
Tianlin Li
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedImViTLM&MA
129
3
0
27 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision
  Language Models
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
75
5
0
27 Apr 2024
From Cognition to Computation: A Comparative Review of Human Attention
  and Transformer Architectures
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao
Dehong Xu
Tao Gao
64
4
0
25 Apr 2024
Understanding attention-based encoder-decoder networks: a case study
  with chess scoresheet recognition
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
94
0
0
23 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for
  Live Video Commenting
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViTVGen
61
4
0
19 Apr 2024
Resilience through Scene Context in Visual Referring Expression
  Generation
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
51
1
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
158
20
0
18 Apr 2024
HANet: A Hierarchical Attention Network for Change Detection With
  Bitemporal Very-High-Resolution Remote Sensing Images
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images
Chengxi Han
Chen Wu
Haonan Guo
Meiqi Hu
Hongruixuan Chen
75
105
0
14 Apr 2024
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
Xuelong Li
Hongjun An
Haofei Zhao
Guangying Li
Bo Liu
Xing Wang
Guanghua Cheng
Guojun Wu
Zhe Sun
89
0
0
14 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image
  Captions as Prompts
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
108
11
0
12 Apr 2024
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in
  Medical Images
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images
Yizhi Pan
Junyi Xin
Tianhua Yang
Teeradaj Racharak
Le-Minh Nguyen
Guanqun Sun
57
4
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient
  Federated Learning
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
77
9
0
12 Apr 2024
Exploring the Necessity of Visual Modality in Multimodal Machine
  Translation using Authentic Datasets
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
Zhenhao Tang
Xianghua Fu
Jian Chen
Shilong Hou
Jinze Lyu
75
2
0
09 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal
  Remote Sensing Image Interpretation
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
143
10
0
06 Apr 2024
A Bi-consolidating Model for Joint Relational Triple Extraction
A Bi-consolidating Model for Joint Relational Triple Extraction
Xiaocheng Luo
Yanping Chen
Ruixue Tang
Caiwei Yang
Ruizhang Huang
Yongbin Qin
95
0
0
05 Apr 2024
Previous
12345...697071
Next