Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,520 papers shown
Title
A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning
Panagiotis Kaliosis
John Pavlopoulos
Foivos Charalampakos
Georgios Moschovis
Ion Androutsopoulos
MedIm
64
2
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
90
5
0
19 Jun 2024
DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning
Xiaowen Ma
Jiawei Yang
Rui Che
Huanting Zhang
Wei Zhang
57
5
0
19 Jun 2024
M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation
Nagur Shareef Shaik
T. Cherukuri
Dong Hye Ye
MedIm
100
0
0
19 Jun 2024
Improving Large Models with Small models: Lower Costs and Better Performance
Dong Chen
Shuo Zhang
Yueting Zhuang
Siliang Tang
Qidong Liu
Hua Wang
Mingliang Xu
96
6
0
15 Jun 2024
Large Language Models Meet Text-Centric Multimodal Sentiment Analysis: A Survey
Hao Yang
Yanyan Zhao
Yang Wu
Shilong Wang
Tian Zheng
Hongbo Zhang
Zongyang Ma
Wanxiang Che
Bing Qin
133
14
0
12 Jun 2024
Stealthy Targeted Backdoor Attacks against Image Captioning
Wenshu Fan
Hongwei Li
Wenbo Jiang
Meng Hao
Shui Yu
Xiao Zhang
DiffM
71
6
0
09 Jun 2024
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges
Daniel A. P. Oliveira
Eugénio Ribeiro
David Martins de Matos
VGen
55
3
0
04 Jun 2024
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning
Wenyan Li
Jiaang Li
R. Ramos
Raphael Tang
Desmond Elliott
VLM
125
3
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
117
16
0
04 Jun 2024
Ultrasound Report Generation with Cross-Modality Feature Alignment via Unsupervised Guidance
Jun Li
Tongkun Su
Baoliang Zhao
Faqin Lv
Qiong Wang
Nassir Navab
Yin Hu
Zhongliang Jiang
MedIm
77
6
0
02 Jun 2024
Image Captioning via Dynamic Path Customization
Yiwei Ma
Jiayi Ji
Xiaoshuai Sun
Yiyi Zhou
Xiaopeng Hong
Yongjian Wu
Rongrong Ji
81
1
0
01 Jun 2024
DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models
Linli Yao
Lei Li
Shuhuai Ren
Lean Wang
Yuanxin Liu
Xu Sun
Lu Hou
76
34
0
31 May 2024
CoSy: Evaluating Textual Explanations of Neurons
Laura Kopf
P. Bommer
Anna Hedström
Sebastian Lapuschkin
Marina M.-C. Höhne
Kirill Bykov
70
13
0
30 May 2024
Source Code Foundation Models are Transferable Binary Analysis Knowledge Bases
Zian Su
Xiangzhe Xu
Ziyang Huang
Kaiyuan Zhang
Xiangyu Zhang
86
8
0
30 May 2024
SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs
Lanting Fang
Yulian Yang
Kai Wang
Shanshan Feng
Kaiyu Feng
Jie Gui
Shuliang Wang
Yew-Soon Ong
104
1
0
29 May 2024
BRACTIVE: A Brain Activation Approach to Human Visual Brain Learning
Xuan-Bac Nguyen
Hojin Jang
Xin Li
Samee U. Khan
Pawan Sinha
Khoa Luu
106
3
0
29 May 2024
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis
Quan Liu
Ruining Deng
Can Cui
Tianyuan Yao
V. Nath
Yucheng Tang
Yuankai Huo
82
0
0
28 May 2024
Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR
Zhenyang Li
Yangyang Guo
Ke-Jyun Wang
Xiaolin Chen
Liqiang Nie
Mohan S. Kankanhalli
LRM
54
8
0
27 May 2024
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks
Yunqi Zhang
Songda Li
Chunyuan Deng
Luyi Wang
Hui Zhao
128
0
0
27 May 2024
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models
Yue Zhang
Hehe Fan
Yi Yang
96
3
0
24 May 2024
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability
Fei Zhao
Taotian Pang
Chunhui Li
Zhen Wu
Junjie Guo
Shangyu Xing
Xinyu Dai
83
7
0
23 May 2024
Towards Retrieval-Augmented Architectures for Image Captioning
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Alessandro Nicolosi
Rita Cucchiara
VLM
84
12
0
21 May 2024
Like Humans to Few-Shot Learning through Knowledge Permeation of Vision and Text
Yuyu Jia
Qing Zhou
Wei Huang
Junyu Gao
Qi. Wang
VLM
87
1
0
21 May 2024
Predicting and Explaining Hearing Aid Usage Using Encoder-Decoder with Attention Mechanism and SHAP
Qiqi Su
Eleftheria Iliadou
48
1
0
18 May 2024
Automated Radiology Report Generation: A Review of Recent Advances
Phillip Sloan
Philip Clatworthy
Edwin Simpson
Majid Mirmehdi
90
21
0
17 May 2024
Faithful Attention Explainer: Verbalizing Decisions Based on Discriminative Features
Yao Rong
David Scheerer
Enkelejda Kasneci
82
0
0
16 May 2024
Spatial Semantic Recurrent Mining for Referring Image Segmentation
Jiaxing Yang
Lihe Zhang
Jiayu Sun
Huchuan Lu
96
0
0
15 May 2024
CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks
Nick Nikzad
Yongsheng Gao
Jun Zhou
66
1
0
09 May 2024
Temporal and Heterogeneous Graph Neural Network for Remaining Useful Life Prediction
Zhihao Wen
Yuan Fang
Pengcheng Wei
Fayao Liu
Zhenghua Chen
Min-man Wu
AI4CE
75
2
0
07 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
102
16
0
05 May 2024
SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection
Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
54
0
0
05 May 2024
Explainable Interface for Human-Autonomy Teaming: A Survey
Xiangqi Kong
Yang Xing
Antonios Tsourdos
Ziyue Wang
Weisi Guo
Adolfo Perrusquía
Andreas Wikander
76
4
0
04 May 2024
FITA: Fine-grained Image-Text Aligner for Radiology Report Generation
Honglong Yang
Hui Tang
Xiaomeng Li
MedIm
72
1
0
02 May 2024
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
90
2
0
28 Apr 2024
Pre-training on High Definition X-ray Images: An Experimental Study
Tianlin Li
Yuehang Li
Wentao Wu
Jiandong Jin
Yao Rong
Bowei Jiang
Chuanfu Li
Jin Tang
MedIm
ViT
LM&MA
129
3
0
27 Apr 2024
SERPENT-VLM : Self-Refining Radiology Report Generation Using Vision Language Models
M. Kapadnis
Sohan Patnaik
Abhilash Nandy
Sourjyadip Ray
Pawan Goyal
Debdoot Sheet
VLM
75
5
0
27 Apr 2024
From Cognition to Computation: A Comparative Review of Human Attention and Transformer Architectures
Minglu Zhao
Dehong Xu
Tao Gao
64
4
0
25 Apr 2024
Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
Sergio Y. Hayashi
N. Hirata
94
0
0
23 Apr 2024
Sentiment-oriented Transformer-based Variational Autoencoder Network for Live Video Commenting
Fengyi Fu
Shancheng Fang
Weidong Chen
Zhendong Mao
ViT
VGen
61
4
0
19 Apr 2024
Resilience through Scene Context in Visual Referring Expression Generation
Simeon Junker
Sina Zarrieß
51
1
0
18 Apr 2024
Towards a Foundation Model for Partial Differential Equations: Multi-Operator Learning and Extrapolation
Jingmin Sun
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
AI4CE
158
20
0
18 Apr 2024
HANet: A Hierarchical Attention Network for Change Detection With Bitemporal Very-High-Resolution Remote Sensing Images
Chengxi Han
Chen Wu
Haonan Guo
Meiqi Hu
Hongruixuan Chen
75
105
0
14 Apr 2024
StreakNet-Arch: An Anti-scattering Network-based Architecture for Underwater Carrier LiDAR-Radar Imaging
Xuelong Li
Hongjun An
Haofei Zhao
Guangying Li
Bo Liu
Xing Wang
Guanghua Cheng
Guojun Wu
Zhe Sun
89
0
0
14 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
108
11
0
12 Apr 2024
A Mutual Inclusion Mechanism for Precise Boundary Segmentation in Medical Images
Yizhi Pan
Junyi Xin
Tianhua Yang
Teeradaj Racharak
Le-Minh Nguyen
Guanqun Sun
57
4
0
12 Apr 2024
FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learning
Duy Phuong Nguyen
J. P. Muñoz
Ali Jannesari
VLM
77
9
0
12 Apr 2024
Exploring the Necessity of Visual Modality in Multimodal Machine Translation using Authentic Datasets
Zi Long
Zhenhao Tang
Xianghua Fu
Jian Chen
Shilong Hou
Jinze Lyu
75
2
0
09 Apr 2024
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation
Danpei Zhao
Bo Yuan
Ziqiang Chen
Tian Li
Zhuoran Liu
Wentao Li
Yue Gao
143
10
0
06 Apr 2024
A Bi-consolidating Model for Joint Relational Triple Extraction
Xiaocheng Luo
Yanping Chen
Ruixue Tang
Caiwei Yang
Ruizhang Huang
Yongbin Qin
95
0
0
05 Apr 2024
Previous
1
2
3
4
5
...
69
70
71
Next