ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXivPDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,508 papers shown
Title
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRM
RALM
42
42
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
34
93
0
27 Sep 2023
CauDR: A Causality-inspired Domain Generalization Framework for
  Fundus-based Diabetic Retinopathy Grading
CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading
Hao Wei
Peilun Shi
Juzheng Miao
Minqing Zhang
Guitao Bai
Jianing Qiu
Furui Liu
Wu Yuan
MedIm
OOD
20
3
0
27 Sep 2023
FaceGemma: Enhancing Image Captioning with Facial Attributes for
  Portrait Images
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque
Iffat Labiba
Sadia Akter
3DH
CVBM
18
1
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai Le-Duc
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
31
5
0
23 Sep 2023
An Empirical Study of Attention Networks for Semantic Segmentation
An Empirical Study of Attention Networks for Semantic Segmentation
Hao Guo
Hongbiao Si
Guilin Jiang
Wei Zhang
Zhiyan Liu
Xuanyi Zhu
Xulong Zhang
Yang Liu
19
1
0
19 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
LM&MA
VLM
22
64
0
18 Sep 2023
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
Ching-Hsun Tseng
Shin-Jye Lee
Po-Wei Cheng
Chien Lee
Chih-Chieh Hung
24
0
0
18 Sep 2023
Holistic Geometric Feature Learning for Structured Reconstruction
Holistic Geometric Feature Learning for Structured Reconstruction
Ziqiong Lu
Linxi Huan
Qiyuan Ma
Xianwei Zheng
17
1
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
21
1
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
26
6
0
15 Sep 2023
PatFig: Generating Short and Long Captions for Patent Figures
PatFig: Generating Short and Long Captions for Patent Figures
Dana Aubakirova
Kim Gerdes
Lufei Liu
17
9
0
15 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
22
25
0
14 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
  Reasoning
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
33
47
0
12 Sep 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
31
15
0
11 Sep 2023
C-CLIP: Contrastive Image-Text Encoders to Close the
  Descriptive-Commentative Gap
C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap
William Theisen
Walter J. Scheirer
CLIP
VLM
35
2
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
26
2
0
06 Sep 2023
Exchanging-based Multimodal Fusion with Transformer
Exchanging-based Multimodal Fusion with Transformer
Renyu Zhu
Chengcheng Han
Yong Qian
Qiushi Sun
Xiang Li
Ming Gao
Xuezhi Cao
Yunsen Xian
40
2
0
05 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
22
0
0
31 Aug 2023
FIRE: Food Image to REcipe generation
FIRE: Food Image to REcipe generation
P. Chhikara
Dhiraj Chaurasia
Yifan Jiang
Omkar Masur
Filip Ilievski
29
23
0
28 Aug 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks
Goodhart's Law Applies to NLP's Explanation Benchmarks
Jennifer Hsia
Danish Pruthi
Aarti Singh
Zachary Chase Lipton
30
6
0
28 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
44
13
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin
Haoxuan Che
Yi-Mou Lin
Haoxing Chen
MedIm
40
57
0
24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
55
19
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLM
CLIP
21
13
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
32
3
0
22 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
38
2
0
21 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
34
16
0
18 Aug 2023
Learning the meanings of function words from grounded language using a
  visual question answering model
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
33
7
0
16 Aug 2023
Visually-Aware Context Modeling for News Image Captioning
Visually-Aware Context Modeling for News Image Captioning
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
VLM
19
8
0
16 Aug 2023
Improving Face Recognition from Caption Supervision with Multi-Granular
  Contextual Feature Aggregation
Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
CVBM
21
2
0
13 Aug 2023
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention
  with Shortcut Features
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features
Yi Zhang
Jitao Sang
Junyan Wang
D. Jiang
Yaowei Wang
21
5
0
13 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
35
68
0
08 Aug 2023
D-Score: A Synapse-Inspired Approach for Filter Pruning
D-Score: A Synapse-Inspired Approach for Filter Pruning
Doyoung Park
Jinsoo Kim
Ji-Min Nam
Jooyoung Chang
S. Park
22
0
0
08 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
H. Shahrzad
Risto Miikkulainen
28
0
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Juncheng Wang
Jindong Wang
Xixu Hu
Shujun Wang
Xingxu Xie
16
1
0
04 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
39
6
0
02 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIP
VLM
27
10
0
02 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked
  Autoencoding and Emotion Transfer Learning
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning
Dustin Pulver
Prithila Angkan
Paul Hungler
Ali Etemad
35
5
0
01 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
28
34
0
31 Jul 2023
Triple Correlations-Guided Label Supplementation for Unbiased Video
  Scene Graph Generation
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation
Wenqing Wang
Kaifeng Gao
Yawei Luo
Tao Jiang
Fei Gao
Jian Shao
Jianwen Sun
Jun Xiao
32
3
0
30 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and
  Delivery Route Prediction
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction
Xiaowei Mao
Haomin Wen
Hengrui Zhang
Huaiyu Wan
Lixia Wu
Jianbin Zheng
Haoyuan Hu
Youfang Lin
AI4TS
54
12
0
30 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning:
  A Survey
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
43
6
0
30 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark
RSGPT: A Remote Sensing Vision Language Model and Benchmark
Yuan Hu
Jianlong Yuan
Congcong Wen
Xiaonan Lu
Xiang Li
VLM
28
101
0
28 Jul 2023
Fact-Checking of AI-Generated Reports
Fact-Checking of AI-Generated Reports
Razi Mahmood
Ge Wang
Mannudeep Kalra
Pingkun Yan
MedIm
24
7
0
27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
Pietro Mascagni
Pietro Mascagni
N. Padoy
Nicolas Padoy
35
20
0
27 Jul 2023
On the Learning Dynamics of Attention Networks
On the Learning Dynamics of Attention Networks
Rahul Vashisht
H. G. Ramaswamy
11
0
0
25 Jul 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
24
4
0
24 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
23
9
0
20 Jul 2023
Previous
123...678...697071
Next