ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
See Detail Say Clear: Towards Brain CT Report Generation via
  Pathological Clue-driven Representation Learning
See Detail Say Clear: Towards Brain CT Report Generation via Pathological Clue-driven Representation Learning
Chengxin Zheng
Junzhong Ji
Yanzhao Shi
Xiaodan Zhang
Liangqiong Qu
3DVMedIm
68
3
0
29 Sep 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
65
1
0
28 Sep 2024
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for
  Zero-shot Captioning
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee
Si-Woo Kim
Taewhan Kim
Dong-Jin Kim
CLIPVLM
61
0
0
26 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
152
2
0
19 Sep 2024
OneEncoder: A Lightweight Framework for Progressive Alignment of
  Modalities
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities
Bilal Faye
Hanane Azzag
M. Lebbah
ObjD
107
0
0
17 Sep 2024
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple
  Operators for Forecasting Fluid Dynamics
PROSE-FD: A Multimodal PDE Foundation Model for Learning Multiple Operators for Forecasting Fluid Dynamics
Yuxuan Liu
Jingmin Sun
Xinjie He
Griffin Pinney
Zecheng Zhang
Hayden Schaeffer
AI4CE
99
9
0
15 Sep 2024
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using
  Large Language Models
KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models
Yingshu Li
Zhanyu Wang
Yunyi Liu
Lei Wang
Lingqiao Liu
Luping Zhou
70
3
0
09 Sep 2024
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive
  Differentiation of Normal and Abnormal Attributes
FODA-PG for Enhanced Medical Imaging Narrative Generation: Adaptive Differentiation of Normal and Abnormal Attributes
Kai Shu
Yuzhuo Jia
Ziyang Zhang
Jiechao Gao
MedIm
89
0
0
06 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
264
15
0
02 Sep 2024
See or Guess: Counterfactually Regularized Image Captioning
See or Guess: Counterfactually Regularized Image Captioning
Qian Cao
Xu Chen
Ruihua Song
Xiting Wang
Xinting Huang
Yuchen Ren
CML
94
1
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
90
1
0
28 Aug 2024
Graph Attention Inference of Network Topology in Multi-Agent Systems
Graph Attention Inference of Network Topology in Multi-Agent Systems
Akshay Kolli
Reza Azadeh
Kshitj Jerath
GNN
53
1
0
27 Aug 2024
Revisiting Image Captioning Training Paradigm via Direct CLIP-based
  Optimization
Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization
Nicholas Moratelli
Davide Caffagni
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
CLIP
97
3
0
26 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CEVLM
139
7
0
23 Aug 2024
VALE: A Multimodal Visual and Language Explanation Framework for Image
  Classifiers using eXplainable AI and Language Models
VALE: A Multimodal Visual and Language Explanation Framework for Image Classifiers using eXplainable AI and Language Models
Purushothaman Natarajan
Athira Nambiar
AAML
43
3
0
23 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual
  Instruction Tuning
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
Hao Fei
Xunliang Cai
LRM
106
7
0
21 Aug 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation
TraDiffusion: Trajectory-Based Training-Free Image Generation
Mingrui Wu
Oucheng Huang
Jiayi Ji
Jiale Li
Xinyue Cai
Huafeng Kuang
Jianzhuang Liu
Xiaoshuai Sun
Rongrong Ji
85
3
0
19 Aug 2024
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted
  Attack for Image-to-Text Models
Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models
Qingyuan Zeng
Zhenzhong Wang
Yiu-ming Cheung
Min Jiang
AAML
88
2
0
16 Aug 2024
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating
  Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
The Dawn of KAN in Image-to-Image (I2I) Translation: Integrating Kolmogorov-Arnold Networks with GANs for Unpaired I2I Translation
Arpan Mahara
N. Rishe
Liangdong Deng
VLMGAN
67
2
0
15 Aug 2024
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
LLMI3D: MLLM-based 3D Perception from a Single 2D Image
Fan Yang
Sicheng Zhao
Yanhao Zhang
Haoxiang Chen
Hui Chen
Wenbo Tang
Guiguang Ding
89
3
0
14 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
85
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
75
0
0
09 Aug 2024
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language
  Modeling
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
William Y. Zhu
Keren Ye
Junjie Ke
Jiahui Yu
Leonidas Guibas
P. Milanfar
Feng Yang
98
2
0
07 Aug 2024
GazeXplain: Learning to Predict Natural Language Explanations of Visual
  Scanpaths
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen
Ming Jiang
Qi Zhao
72
3
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
87
1
0
04 Aug 2024
ST-SACLF: Style Transfer Informed Self-Attention Classifier for
  Bias-Aware Painting Classification
ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification
Mridula Vijendran
Frederick W. B. Li
Jingjing Deng
Hubert P. H. Shum
92
0
0
03 Aug 2024
Review of Cloud Service Composition for Intelligent Manufacturing
Review of Cloud Service Composition for Intelligent Manufacturing
Cuixia Li
Liqiang Liu
Li Shi
38
0
0
03 Aug 2024
Towards End-to-End Explainable Facial Action Unit Recognition via
  Vision-Language Joint Learning
Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning
Yaming Yang
Zhe Wang
Fuhai Chen
Wei Zhao
Weigang Lu
Joemon M. Jose
CVBM
73
3
0
01 Aug 2024
Block-Operations: Using Modular Routing to Improve Compositional
  Generalization
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
57
0
0
01 Aug 2024
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided
  Attention for Enhanced Document-level Relation Extraction
GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction
Yanxu Mao
Peipei Liu
Tiehan Cui
67
1
0
31 Jul 2024
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Faithful and Plausible Natural Language Explanations for Image Classification: A Pipeline Approach
Adam Wojciechowski
Mateusz Lango
Ondrej Dusek
FAtt
88
1
0
30 Jul 2024
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger
  Visual Cues
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
80
7
0
29 Jul 2024
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
HICEScore: A Hierarchical Metric for Image Captioning Evaluation
Zequn Zeng
Jianqiao Sun
Hao Zhang
Tiansheng Wen
Yudi Su
Yan Xie
Zhengjue Wang
Boli Chen
101
3
0
26 Jul 2024
Attention Beats Linear for Fast Implicit Neural Representation
  Generation
Attention Beats Linear for Fast Implicit Neural Representation Generation
Shuyi Zhang
Ke Liu
Jingjun Gu
Xiaoxu Cai
Zhihua Wang
Jiajun Bu
Haishuai Wang
122
2
0
22 Jul 2024
HERGen: Elevating Radiology Report Generation with Longitudinal Data
HERGen: Elevating Radiology Report Generation with Longitudinal Data
Fuying Wang
Shenghui Du
Lequan Yu
MedIm
81
6
0
21 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning
  of CLIP and Fastspeech2
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
77
0
0
19 Jul 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text
  Generation: A State-of-the-Art Investigation
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
92
10
0
19 Jul 2024
Nearest Neighbor Future Captioning: Generating Descriptions for Possible
  Collisions in Object Placement Tasks
Nearest Neighbor Future Captioning: Generating Descriptions for Possible Collisions in Object Placement Tasks
Takumi Komatsu
Motonari Kambara
Shumpei Hatanaka
Haruka Matsuo
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Komei Sugiura
71
0
0
18 Jul 2024
XEdgeAI: A Human-centered Industrial Inspection Framework with
  Data-centric Explainable Edge AI Approach
XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach
Truong Thanh Hung Nguyen
Phuc Truong Loc Nguyen
Hung Cao
72
6
0
16 Jul 2024
Backdoor Attacks against Image-to-Image Networks
Backdoor Attacks against Image-to-Image Networks
Wenbo Jiang
Hongwei Li
Jiaming He
Rui Zhang
Guowen Xu
Tianwei Zhang
Rongxing Lu
AAML
73
5
0
15 Jul 2024
Predicting Winning Captions for Weekly New Yorker Comics
Predicting Winning Captions for Weekly New Yorker Comics
Stanley Cao
Sonny Young
ViTVLM
64
1
0
12 Jul 2024
LEMoN: Label Error Detection using Multimodal Neighbors
LEMoN: Label Error Detection using Multimodal Neighbors
Haoran Zhang
Aparna Balagopalan
Nassim Oufattole
Hyewon Jeong
Yan Wu
Jiacheng Zhu
Marzyeh Ghassemi
134
0
0
10 Jul 2024
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang
Ruohan Dong
Jiayi Ji
Yiwei Ma
Haowei Wang
Xiaoshuai Sun
Rongrong Ji
87
3
0
07 Jul 2024
Ask Questions with Double Hints: Visual Question Generation with
  Answer-awareness and Region-reference
Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference
Kai Shen
Lingfei Wu
Siliang Tang
Fangli Xu
Bo Long
Yueting Zhuang
Jian Pei
70
0
0
06 Jul 2024
Towards Context-Aware Emotion Recognition Debiasing from a Causal
  Demystification Perspective via De-confounded Training
Towards Context-Aware Emotion Recognition Debiasing from a Causal Demystification Perspective via De-confounded Training
Dingkang Yang
Kun Yang
Haopeng Kuang
Zhaoyu Chen
Yuzheng Wang
Lihua Zhang
CML
83
4
0
06 Jul 2024
Explainable Image Captioning using CNN- CNN architecture and
  Hierarchical Attention
Explainable Image Captioning using CNN- CNN architecture and Hierarchical Attention
Rishi Mohan
Sanjay Sureshkumar
Vignesh Sivasubramaniam
43
2
0
28 Jun 2024
Analyzing Quality, Bias, and Performance in Text-to-Image Generative
  Models
Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models
Nila Masrourisaadat
Nazanin Sedaghatkish
Fatemeh Sarshartehrani
Edward A. Fox
120
9
0
28 Jun 2024
Brain Tumor Classification using Vision Transformer with Selective
  Cross-Attention Mechanism and Feature Calibration
Brain Tumor Classification using Vision Transformer with Selective Cross-Attention Mechanism and Feature Calibration
M. Khaniki
Alireza Golkarieh
Mohammad Manthouri
MedIm
61
4
0
25 Jun 2024
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Enhancing Scientific Figure Captioning Through Cross-modal Learning
Mateo Alejandro Rojas
Rafael Carranza
75
0
0
24 Jun 2024
Reading Is Believing: Revisiting Language Bottleneck Models for Image
  Classification
Reading Is Believing: Revisiting Language Bottleneck Models for Image Classification
Honori Udo
Takafumi Koshinaka
VLM
71
0
0
22 Jun 2024
Previous
123456...697071
Next