ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
Few-shot Action Recognition with Captioning Foundation Models
Few-shot Action Recognition with Captioning Foundation Models
Xiang Wang
Shiwei Zhang
Hangjie Yuan
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
VLM
126
7
0
16 Oct 2023
Visual Question Generation in Bengali
Visual Question Generation in Bengali
Mahmud Hasan
Labiba Islam
J. Ruma
T. Mayeesha
Rashedur Rahman
77
1
0
12 Oct 2023
CLIP for Lightweight Semantic Segmentation
CLIP for Lightweight Semantic Segmentation
Ke Jin
Wankou Yang
VLM
91
1
0
11 Oct 2023
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for
  Image Caption Generation
A Comparative Study of Pre-trained CNNs and GRU-Based Attention for Image Caption Generation
Rashid Khan
Bingding Huang
Haseeb Hassan
Asim Zaman
Z. Ye
44
2
0
11 Oct 2023
A Lightweight Video Anomaly Detection Model with Weak Supervision and
  Adaptive Instance Selection
A Lightweight Video Anomaly Detection Model with Weak Supervision and Adaptive Instance Selection
Yang Wang
Jiaogen Zhou
Jihong Guan
88
4
0
09 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
127
211
0
03 Oct 2023
Constructing Image-Text Pair Dataset from Books
Constructing Image-Text Pair Dataset from Books
Yamato Okamoto
Haruto Toyonaga
Yoshihisa Ijiri
Hirokatsu Kataoka
79
3
0
03 Oct 2023
Application of frozen large-scale models to multimodal task-oriented
  dialogue
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
63
1
0
02 Oct 2023
YOLOR-Based Multi-Task Learning
YOLOR-Based Multi-Task Learning
Hung-Shuo Chang
Chien-Yao Wang
Hang Yan
Yukun Zhu
Hongpeng Liao
MoEVLM
59
16
0
29 Sep 2023
PROSE: Predicting Operators and Symbolic Expressions using Multimodal
  Transformers
PROSE: Predicting Operators and Symbolic Expressions using Multimodal Transformers
Yuxuan Liu
Zecheng Zhang
Hayden Schaeffer
90
18
0
28 Sep 2023
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
XVO: Generalized Visual Odometry via Cross-Modal Self-Training
Tohida Rehman
Ronit Mandal
Jimuyang Zhang
Debarshi Kumar Sanyal
SSL
134
21
0
28 Sep 2023
Social Media Fashion Knowledge Extraction as Captioning
Social Media Fashion Knowledge Extraction as Captioning
Yifei Yuan
Wenxuan Zhang
Yang Deng
Wai Lam
54
1
0
28 Sep 2023
Attention Sorting Combats Recency Bias In Long Context Language Models
Attention Sorting Combats Recency Bias In Long Context Language Models
A. Peysakhovich
Adam Lerer
LRMRALM
123
52
0
28 Sep 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Avamarie Brueggeman
Andrea Madotto
Zhaojiang Lin
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
89
94
0
27 Sep 2023
CauDR: A Causality-inspired Domain Generalization Framework for
  Fundus-based Diabetic Retinopathy Grading
CauDR: A Causality-inspired Domain Generalization Framework for Fundus-based Diabetic Retinopathy Grading
Hao Wei
Peilun Shi
Juzheng Miao
Minqing Zhang
Guitao Bai
Jianing Qiu
Furui Liu
Wu Yuan
MedImOOD
50
3
0
27 Sep 2023
FaceGemma: Enhancing Image Captioning with Facial Attributes for
  Portrait Images
FaceGemma: Enhancing Image Captioning with Facial Attributes for Portrait Images
Naimul Haque
Iffat Labiba
Sadia Akter
3DHCVBM
43
1
0
24 Sep 2023
A Survey on Image-text Multimodal Models
A Survey on Image-text Multimodal Models
Ruifeng Guo
Jingxuan Wei
Linzhuang Sun
Khai-Nguyen Nguyen
Guiyong Chang
Dawei Liu
Sibo Zhang
Zhengbing Yao
Mingjun Xu
Liping Bu
VLM
130
7
0
23 Sep 2023
An Empirical Study of Attention Networks for Semantic Segmentation
An Empirical Study of Attention Networks for Semantic Segmentation
Hao Guo
Hongbiao Si
Guilin Jiang
Wei Zhang
Zhiyan Liu
Xuanyi Zhu
Xulong Zhang
Yang Liu
65
1
0
19 Sep 2023
R2GenGPT: Radiology Report Generation with Frozen LLMs
R2GenGPT: Radiology Report Generation with Frozen LLMs
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedImLM&MAVLM
93
75
0
18 Sep 2023
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
A Novel Method of Fuzzy Topic Modeling based on Transformer Processing
Ching-Hsun Tseng
Shin-Jye Lee
Po-Wei Cheng
Chien Lee
Chih-Chieh Hung
31
0
0
18 Sep 2023
Holistic Geometric Feature Learning for Structured Reconstruction
Holistic Geometric Feature Learning for Structured Reconstruction
Ziqiong Lu
Linxi Huan
Qiyuan Ma
Xianwei Zheng
76
1
0
18 Sep 2023
Viewpoint Integration and Registration with Vision Language Foundation
  Model for Image Change Understanding
Viewpoint Integration and Registration with Vision Language Foundation Model for Image Change Understanding
Xiaonan Lu
Jianlong Yuan
Ruigang Niu
Yuan Hu
Fan Wang
50
2
0
15 Sep 2023
Towards Practical and Efficient Image-to-Speech Captioning with
  Vision-Language Pre-training and Multi-modal Tokens
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens
Minsu Kim
J. Choi
Soumi Maiti
Jeong Hun Yeo
Shinji Watanabe
Y. Ro
VLM
83
6
0
15 Sep 2023
PatFig: Generating Short and Long Captions for Patent Figures
PatFig: Generating Short and Long Captions for Patent Figures
Dana Aubakirova
Kim Gerdes
Lufei Liu
48
11
0
15 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and
  Reasoning
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
Enna Sachdeva
Nakul Agarwal
Suhas Chundi
Sean Roelofs
Jiachen Li
Mykel Kochenderfer
Chiho Choi
Behzad Dariush
92
51
0
12 Sep 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
85
19
0
11 Sep 2023
C-CLIP: Contrastive Image-Text Encoders to Close the
  Descriptive-Commentative Gap
C-CLIP: Contrastive Image-Text Encoders to Close the Descriptive-Commentative Gap
William Theisen
Walter J. Scheirer
CLIPVLM
73
2
0
06 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and
  Language Models
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
76
2
0
06 Sep 2023
Exchanging-based Multimodal Fusion with Transformer
Exchanging-based Multimodal Fusion with Transformer
Renyu Zhu
Chengcheng Han
Yong Qian
Qiushi Sun
Xiang Li
Ming Gao
Xuezhi Cao
Yunsen Xian
71
2
0
05 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
102
0
0
31 Aug 2023
FIRE: Food Image to REcipe generation
FIRE: Food Image to REcipe generation
P. Chhikara
Dhiraj Chaurasia
Yifan Jiang
Omkar Masur
Filip Ilievski
81
23
0
28 Aug 2023
Goodhart's Law Applies to NLP's Explanation Benchmarks
Goodhart's Law Applies to NLP's Explanation Benchmarks
Jennifer Hsia
Danish Pruthi
Aarti Singh
Zachary Chase Lipton
79
6
0
28 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLMCLIP
80
13
0
25 Aug 2023
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin
Haoxuan Che
Yi Lin
Haoxing Chen
MedIm
113
67
0
24 Aug 2023
With a Little Help from your own Past: Prototypical Memory Networks for
  Image Captioning
With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning
Manuele Barraco
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
VLM
92
20
0
23 Aug 2023
CgT-GAN: CLIP-guided Text GAN for Image Captioning
CgT-GAN: CLIP-guided Text GAN for Image Captioning
Jiarui Yu
Haoran Li
Y. Hao
B. Zhu
Tong Xu
Xiangnan He
VLMCLIP
72
13
0
23 Aug 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
60
4
0
22 Aug 2023
Explore and Tell: Embodied Visual Captioning in 3D Environments
Explore and Tell: Embodied Visual Captioning in 3D Environments
Anwen Hu
Shizhe Chen
Liang Zhang
Qin Jin
LM&Ro
85
2
0
21 Aug 2023
Lip Reading for Low-resource Languages by Learning and Combining General
  Speech Knowledge and Language-specific Knowledge
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge
Minsu Kim
Jeong Hun Yeo
J. Choi
Y. Ro
79
17
0
18 Aug 2023
Learning the meanings of function words from grounded language using a
  visual question answering model
Learning the meanings of function words from grounded language using a visual question answering model
Eva Portelance
Michael C. Frank
Dan Jurafsky
NAI
90
7
0
16 Aug 2023
Visually-Aware Context Modeling for News Image Captioning
Visually-Aware Context Modeling for News Image Captioning
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
VLM
62
9
0
16 Aug 2023
Improving Face Recognition from Caption Supervision with Multi-Granular
  Contextual Feature Aggregation
Improving Face Recognition from Caption Supervision with Multi-Granular Contextual Feature Aggregation
Md Golam Moula Mehedi Hasan
Nasser M. Nasrabadi
CVBM
47
2
0
13 Aug 2023
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention
  with Shortcut Features
Benign Shortcut for Debiasing: Fair Visual Recognition via Intervention with Shortcut Features
Yi Zhang
Jitao Sang
Junyan Wang
D. Jiang
Yaowei Wang
76
5
0
13 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
121
74
0
08 Aug 2023
D-Score: A Synapse-Inspired Approach for Filter Pruning
D-Score: A Synapse-Inspired Approach for Filter Pruning
Doyoung Park
Jinsoo Kim
Ji-Min Nam
Jooyoung Chang
S. Park
59
0
0
08 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
Hormoz Shahrzad
Risto Miikkulainen
56
0
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
70
2
0
05 Aug 2023
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Frustratingly Easy Model Generalization by Dummy Risk Minimization
Juncheng Wang
Jindong Wang
Xixu Hu
Shujun Wang
Xingxu Xie
58
2
0
04 Aug 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLMDiffM
94
6
0
02 Aug 2023
Previous
123...678...697071
Next