ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position
  and Scale
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl
Michal Wronka
Maciej Wolczyk
Kamil Adamczewski
Tomasz Trzciñski
Bartosz Zieliñski
85
2
0
04 Apr 2024
Memory-based Cross-modal Semantic Alignment Network for Radiology Report
  Generation
Memory-based Cross-modal Semantic Alignment Network for Radiology Report Generation
Yitian Tao
Liyan Ma
Jing Yu
Han Zhang
MedIm
93
8
0
31 Mar 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
97
7
0
28 Mar 2024
De-confounded Data-free Knowledge Distillation for Handling Distribution
  Shifts
De-confounded Data-free Knowledge Distillation for Handling Distribution Shifts
Yuzheng Wang
Dingkang Yang
Zhaoyu Chen
Yang Liu
Siao Liu
Wenqiang Zhang
Lihua Zhang
Lizhe Qi
73
9
0
28 Mar 2024
Text Data-Centric Image Captioning with Interactive Prompts
Text Data-Centric Image Captioning with Interactive Prompts
Yiyu Wang
Hao Luo
Jungang Xu
Yingfei Sun
Fan Wang
VLM
80
0
0
28 Mar 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
96
0
0
26 Mar 2024
Selectively Informative Description can Reduce Undesired Embedding
  Entanglements in Text-to-Image Personalization
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim
Jungwon Park
Wonjong Rhee
DiffM
91
5
0
22 Mar 2024
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and
  Report Generation
TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation
Santosh Sanjeev
F. Maani
Arsen Abzhanov
Vijay Ram Papineni
Ibrahim Almakky
Bartlomiej W. Papie.z
Mohammad Yaqub
MedIm
87
0
0
20 Mar 2024
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling
Daniel Duenias
Brennan Nichyporuk
Tal Arbel
Tammy Riklin-Raviv
95
7
0
20 Mar 2024
Training A Small Emotional Vision Language Model for Visual Art
  Comprehension
Training A Small Emotional Vision Language Model for Visual Art Comprehension
Jing Zhang
Liang Zheng
Meng Wang
Dan Guo
VLM
71
4
0
17 Mar 2024
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for
  Remote Sensing Image-Text Retrival
LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival
Yuanxin Zhao
Mi Zhang
Bingnan Yang
Zhan Zhang
Jiaju Kang
Jianya Gong
62
2
0
16 Mar 2024
Select and Distill: Selective Dual-Teacher Knowledge Transfer for
  Continual Learning on Vision-Language Models
Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
Yu-Chu Yu
Chi-Pin Huang
Jr-Jen Chen
Kai-Po Chang
Yung-Hsuan Lai
Fu-En Yang
Yu-Chiang Frank Wang
CLLVLM
97
9
0
14 Mar 2024
Rethinking Referring Object Removal
Rethinking Referring Object Removal
Xiangtian Xue
Jiasong Wu
Youyong Kong
L. Senhadji
Huazhong Shu
DiffM
79
0
0
14 Mar 2024
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision
  Language Navigation
TINA: Think, Interaction, and Action Framework for Zero-Shot Vision Language Navigation
Dingbang Li
Wenzhou Chen
Xin Lin
LLMAGLM&Ro
77
4
0
13 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
92
10
0
12 Mar 2024
Enhancing Image Caption Generation Using Reinforcement Learning with
  Human Feedback
Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback
L. AdarshN
V. ArunP
L. AravindhN
39
3
0
11 Mar 2024
How to Understand Named Entities: Using Common Sense for News Captioning
How to Understand Named Entities: Using Common Sense for News Captioning
Ning Xu
Yanhui Wang
Tingting Zhang
Hongshuo Tian
Mohan Kankanhalli
An-An Liu
63
0
0
11 Mar 2024
Transformer based Multitask Learning for Image Captioning and Object
  Detection
Transformer based Multitask Learning for Image Captioning and Object Detection
Debolena Basak
P. K. Srijith
M. Desarkar
74
2
0
10 Mar 2024
Sora as an AGI World Model? A Complete Survey on Text-to-Video
  Generation
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation
Joseph Cho
Fachrina Dewi Puspitasari
Sheng Zheng
Jingyao Zheng
Lik-Hang Lee
Tae-Ho Kim
Choong Seon Hong
Chaoning Zhang
EGVMVGen
106
43
0
08 Mar 2024
Rule-driven News Captioning
Rule-driven News Captioning
Ning Xu
Tingting Zhang
Hongshuo Tian
An-An Liu
104
0
0
08 Mar 2024
Towards Multimodal Human Intention Understanding Debiasing via
  Subject-Deconfounding
Towards Multimodal Human Intention Understanding Debiasing via Subject-Deconfounding
Dingkang Yang
Dongling Xiao
Ke Li
Yuzheng Wang
Zhaoyu Chen
Jinjie Wei
Lihua Zhang
69
8
0
08 Mar 2024
MeaCap: Memory-Augmented Zero-shot Image Captioning
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng
Yan Xie
Hao Zhang
Chiyu Chen
Zhengjue Wang
Boli Chen
VLM
86
15
0
06 Mar 2024
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach
  for Relation Classification
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification
Robert Vacareanu
F. Alam
M. Islam
Haris Riaz
Mihai Surdeanu
NAI
81
2
0
05 Mar 2024
Causal Prompting: Debiasing Large Language Model Prompting based on
  Front-Door Adjustment
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment
Congzhi Zhang
Linhai Zhang
Jialong Wu
Deyu Zhou
Guoqiang Xu
CMLAI4CELRM
107
21
0
05 Mar 2024
Attention Guidance Mechanism for Handwritten Mathematical Expression
  Recognition
Attention Guidance Mechanism for Handwritten Mathematical Expression Recognition
Yutian Liu
Wenjun Ke
Jianguo Wei
111
0
0
04 Mar 2024
DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable
  Causal Inference
DINER: Debiasing Aspect-based Sentiment Analysis with Multi-variable Causal Inference
Jialong Wu
Linhai Zhang
Deyu Zhou
Guoqiang Xu
CML
69
3
0
02 Mar 2024
ELA: Efficient Local Attention for Deep Convolutional Neural Networks
ELA: Efficient Local Attention for Deep Convolutional Neural Networks
Wei Xu
Yi Wan
85
43
0
02 Mar 2024
How to Understand "Support"? An Implicit-enhanced Causal Inference
  Approach for Weakly-supervised Phrase Grounding
How to Understand "Support"? An Implicit-enhanced Causal Inference Approach for Weakly-supervised Phrase Grounding
Jiamin Luo
Jianing Zhao
Jingjing Wang
Guodong Zhou
64
0
0
29 Feb 2024
SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility
  Awareness for Freespace Detection
SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection
Yi Feng
Yu Ma
Qijun Chen
Ioannis Pitas
Rui Fan
87
6
0
29 Feb 2024
Polos: Multimodal Metric Learning from Human Feedback for Image
  Captioning
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada
Kanta Kaneda
Daichi Saito
Komei Sugiura
91
30
0
28 Feb 2024
Vision Language Model-based Caption Evaluation Method Leveraging Visual
  Context Extraction
Vision Language Model-based Caption Evaluation Method Leveraging Visual Context Extraction
Koki Maeda
Shuhei Kurita
Taiki Miyanishi
Naoaki Okazaki
59
2
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
301
22
0
28 Feb 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
Minsu Kim
Jee-weon Jung
Hyeongseop Rha
Soumi Maiti
Siddhant Arora
Xuankai Chang
Shinji Watanabe
Y. Ro
106
7
0
25 Feb 2024
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
Li Mi
Syrielle Montariol
J. Castillo-Navarro
Xianjie Dai
Antoine Bosselut
D. Tuia
54
4
0
20 Feb 2024
Heterogeneity-aware Cross-school Electives Recommendation: a Hybrid
  Federated Approach
Heterogeneity-aware Cross-school Electives Recommendation: a Hybrid Federated Approach
Chengyi Ju
Jiannong Cao
Yu Yang
Zhen-Qun Yang
Ho Man Lee
54
1
0
19 Feb 2024
AICAttack: Adversarial Image Captioning Attack with Attention-Based
  Optimization
AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization
Jiyao Li
Mingze Ni
Yifei Dong
Tianqing Zhu
Wei Liu
AAML
43
3
0
19 Feb 2024
Align before Attend: Aligning Visual and Textual Features for Multimodal
  Hateful Content Detection
Align before Attend: Aligning Visual and Textual Features for Multimodal Hateful Content Detection
E. Hossain
Omar Sharif
M. M. Hoque
S. Preum
71
4
0
15 Feb 2024
On the Resurgence of Recurrent Models for Long Sequences -- Survey and
  Research Opportunities in the Transformer Era
On the Resurgence of Recurrent Models for Long Sequences -- Survey and Research Opportunities in the Transformer Era
Matteo Tiezzi
Michele Casoni
Alessandro Betti
Tommaso Guidi
Marco Gori
S. Melacci
81
11
0
12 Feb 2024
Savvy: Trustworthy Autonomous Vehicles Architecture
Savvy: Trustworthy Autonomous Vehicles Architecture
Ali Shoker
Rehana Yasmin
Paulo Esteves-Verissimo
79
0
0
08 Feb 2024
Intensive Vision-guided Network for Radiology Report Generation
Intensive Vision-guided Network for Radiology Report Generation
Fudan Zheng
Mengfei Li
Ying Wang
Weijiang Yu
Ruixuan Wang
Zhiguang Chen
Nong Xiao
Yutong Lu
160
1
0
06 Feb 2024
Revisiting Generative Adversarial Networks for Binary Semantic
  Segmentation on Imbalanced Datasets
Revisiting Generative Adversarial Networks for Binary Semantic Segmentation on Imbalanced Datasets
Lei Xu
Moncef Gabbouj
GAN
69
2
0
03 Feb 2024
Image Fusion via Vision-Language Model
Image Fusion via Vision-Language Model
Zixiang Zhao
Lilun Deng
Haowen Bai
Yukun Cui
Zhipeng Zhang
...
Haotong Qin
Dongdong Chen
Jiangshe Zhang
Peng Wang
Luc Van Gool
VLM
110
27
0
03 Feb 2024
MLIP: Enhancing Medical Visual Representation with Divergence Encoder
  and Knowledge-guided Contrastive Learning
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li
Laurence T. Yang
Bocheng Ren
Xin Nie
Zhangyang Gao
Cheng Tan
Stan Z. Li
VLM
79
16
0
03 Feb 2024
Streaming Sequence Transduction through Dynamic Compression
Streaming Sequence Transduction through Dynamic Compression
Weiting Tan
Yunmo Chen
Tongfei Chen
Guanghui Qin
Haoran Xu
Heidi C. Zhang
Benjamin Van Durme
Philipp Koehn
169
2
0
02 Feb 2024
Attention-based Dynamic Multilayer Graph Neural Networks for Loan
  Default Prediction
Attention-based Dynamic Multilayer Graph Neural Networks for Loan Default Prediction
Sahab Zandi
Kamesh Korangi
María Óskarsdóttir
Christophe Mues
Cristián Bravo
58
6
0
01 Feb 2024
GQHAN: A Grover-inspired Quantum Hard Attention Network
GQHAN: A Grover-inspired Quantum Hard Attention Network
Ren-Xin Zhao
Jinjing Shi
Xuelong Li
67
3
0
25 Jan 2024
MAST: Video Polyp Segmentation with a Mixture-Attention Siamese
  Transformer
MAST: Video Polyp Segmentation with a Mixture-Attention Siamese Transformer
Geng Chen
Junqing Yang
Xiaozhou Pu
Ge-Peng Ji
Huan Xiong
Yongsheng Pan
Hengfei Cui
Yong-quan Xia
MedImViT
97
2
0
23 Jan 2024
Unsupervised Learning of Graph from Recipes
Unsupervised Learning of Graph from Recipes
Aissatou Diallo
Antonis Bikakis
Luke Dickens
Anthony Hunter
Rob Miller
SSL
65
0
0
22 Jan 2024
Collaborative Position Reasoning Network for Referring Image
  Segmentation
Collaborative Position Reasoning Network for Referring Image Segmentation
Jianjian Cao
Beiya Dai
Yulin Li
Xiameng Qin
Jingdong Wang
102
0
0
22 Jan 2024
Spatial-temporal Forecasting for Regions without Observations
Spatial-temporal Forecasting for Regions without Observations
Xinyu Su
Jianzhong Qi
E. Tanin
Yanchuan Chang
Majid Sarvi
AI4TS
81
3
0
19 Jan 2024
Previous
123456...697071
Next