ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1502.03044
  4. Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
    DiffM
ArXiv (abs)PDFHTML

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown
Title
Bayesian Attention Modules
Bayesian Attention Modules
Xinjie Fan
Shujian Zhang
Bo Chen
Mingyuan Zhou
183
62
0
20 Oct 2020
Interpreting convolutional networks trained on textual data
Interpreting convolutional networks trained on textual data
Reza Marzban
Christopher Crick
FAtt
39
3
0
20 Oct 2020
A Survey on Deep Learning and Explainability for Automatic Report
  Generation from Medical Images
A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images
Pablo Messina
Pablo Pino
Denis Parra
Alvaro Soto
Cecilia Besa
S. Uribe
Marcelo andía
C. Tejos
Claudia Prieto
Daniel Capurro
MedIm
135
65
0
20 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded
  Dialogues
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
117
31
0
20 Oct 2020
Improving Factual Completeness and Consistency of Image-to-Text
  Radiology Report Generation
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation
Yasuhide Miura
Yuhao Zhang
Emily Bao Tsai
C. Langlotz
Dan Jurafsky
MedIm
250
159
0
20 Oct 2020
Learning to Reconstruct and Segment 3D Objects
Learning to Reconstruct and Segment 3D Objects
Bo Yang
3DPC
65
1
0
19 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and
  Emerging Trends
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends
Shagun Uppal
Sarthak Bhagat
Devamanyu Hazarika
Navonil Majumdar
Soujanya Poria
Roger Zimmermann
Amir Zadeh
101
6
0
19 Oct 2020
Image Captioning with Visual Object Representations Grounded in the
  Textual Modality
Image Captioning with Visual Object Representations Grounded in the Textual Modality
Duvsan Varivs
Katsuhito Sudoh
Satoshi Nakamura
39
1
0
19 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong
Cristian Rodriguez-Opazo
Yuankai Qi
Qi Wu
Stephen Gould
LM&Ro
229
135
0
19 Oct 2020
TextMage: The Automated Bangla Caption Generator Based On Deep Learning
TextMage: The Automated Bangla Caption Generator Based On Deep Learning
Abrar Hasin Kamal
Md Asifuzzaman Jishan
N. Mansoor
VLM
46
21
0
15 Oct 2020
Improving Natural Language Processing Tasks with Human Gaze-Guided
  Neural Attention
Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention
Ekta Sood
Simon Tannert
Philipp Mueller
Andreas Bulling
94
74
0
15 Oct 2020
Interpreting Deep Learning Model Using Rule-based Method
Interpreting Deep Learning Model Using Rule-based Method
Xiaojian Wang
Jingyuan Wang
Ke Tang
23
3
0
15 Oct 2020
Interpreting Attention Models with Human Visual Attention in Machine
  Reading Comprehension
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension
Ekta Sood
Simon Tannert
Diego Frassinelli
Andreas Bulling
Ngoc Thang Vu
HAI
75
57
0
13 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase
  Grounding
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding
Qinxin Wang
Hao Tan
Sheng Shen
Michael W. Mahoney
Z. Yao
ObjD
154
11
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
106
46
0
11 Oct 2020
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in
  Image Classification
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification
Yulin Wang
Kangchen Lv
Rui Huang
Shiji Song
Le Yang
Gao Huang
3DH
65
151
0
11 Oct 2020
Boosted EfficientNet: Detection of Lymph Node Metastases in Breast
  Cancer Using Convolutional Neural Network
Boosted EfficientNet: Detection of Lymph Node Metastases in Breast Cancer Using Convolutional Neural Network
Jun Wang
Qianying Liu
Haotian Xie
Zhaogang Yang
Hefeng Zhou
MedIm
70
79
0
10 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning
Beyond Language: Learning Commonsense from Images for Reasoning
Wanqing Cui
Yanyan Lan
Liang Pang
Jiafeng Guo
Xueqi Cheng
LRM
71
5
0
10 Oct 2020
Widget Captioning: Generating Natural Language Description for Mobile
  User Interface Elements
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements
Yongqian Li
Gang Li
Luheng He
Jingjie Zheng
Hong Li
Zhiwei Guan
71
110
0
08 Oct 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
115
27
0
08 Oct 2020
Visual News: Benchmark and Challenges in News Image Captioning
Visual News: Benchmark and Challenges in News Image Captioning
Fuxiao Liu
Yinghan Wang
Tianlu Wang
Vicente Ordonez
VLM
88
116
0
08 Oct 2020
Towards Understanding Sample Variance in Visually Grounded Language
  Generation: Evaluations and Observations
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations
Wanrong Zhu
Xinze Wang
P. Narayana
Kazoo Sone
Sugato Basu
William Yang Wang
44
8
0
07 Oct 2020
Narrative Text Generation with a Latent Discrete Plan
Narrative Text Generation with a Latent Discrete Plan
Harsh Jhamtani
Taylor Berg-Kirkpatrick
53
17
0
07 Oct 2020
Learning to Represent Image and Text with Denotation Graph
Learning to Represent Image and Text with Denotation Graph
Bowen Zhang
Hexiang Hu
Vihan Jain
Eugene Ie
Fei Sha
78
22
0
06 Oct 2020
Visualizing Color-wise Saliency of Black-Box Image Classification Models
Visualizing Color-wise Saliency of Black-Box Image Classification Models
Yuhki Hatakeyama
Hiroki Sakuma
Yoshinori Konishi
Kohei Suenaga
FAtt
72
3
0
06 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition
Fine-Grained Grounding for Multimodal Speech Recognition
Tejas Srinivasan
Ramon Sanabria
Florian Metze
Desmond Elliott
78
11
0
05 Oct 2020
A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning
A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning
Ruchika Chavhan
Biplab Banerjee
Xiaoxiang Zhu
S. Chaudhuri
32
8
0
05 Oct 2020
AFN: Attentional Feedback Network based 3D Terrain Super-Resolution
AFN: Attentional Feedback Network based 3D Terrain Super-Resolution
A. Kubade
D. Patel
Avinash Sharma
K. Rajan
SupR
48
10
0
04 Oct 2020
Explaining Deep Neural Networks
Explaining Deep Neural Networks
Oana-Maria Camburu
XAIFAtt
110
26
0
04 Oct 2020
Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position
  Estimation
Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation
Patrick Dendorfer
Aljosa Osep
Laura Leal-Taixé
100
110
0
02 Oct 2020
Multi-Modal Open-Domain Dialogue
Multi-Modal Open-Domain Dialogue
Kurt Shuster
Eric Michael Smith
Da Ju
Jason Weston
AI4CE
141
44
0
02 Oct 2020
MGD-GAN: Text-to-Pedestrian generation through Multi-Grained
  Discrimination
MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination
Shengyu Zhang
Donghui Wang
Zhou Zhao
Siliang Tang
Di Xie
Leilei Gan
32
0
0
02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired
  Images and Text
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
238
774
0
02 Oct 2020
MQTransformer: Multi-Horizon Forecasts with Context Dependent and
  Feedback-Aware Attention
MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention
Carson Eisenach
Yagna Patel
Dhruv Madeka
AI4TS
107
37
0
30 Sep 2020
Teacher-Critical Training Strategies for Image Captioning
Teacher-Critical Training Strategies for Image Captioning
Yiqing Huang
Jiansheng Chen
VLM
63
9
0
30 Sep 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for
  Change Captioning
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning
Xiangxi Shi
Xu Yang
Jiuxiang Gu
Shafiq Joty
Jianfei Cai
71
53
0
30 Sep 2020
Attention-Driven Body Pose Encoding for Human Activity Recognition
Attention-Driven Body Pose Encoding for Human Activity Recognition
B Debnath
Mary O'Brien
Swagat Kumar
Ardhendu Behera
3DHCVBM
90
5
0
29 Sep 2020
Spatial Attention as an Interface for Image Captioning Models
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
60
0
0
29 Sep 2020
Knowledge Fusion Transformers for Video Action Recognition
Knowledge Fusion Transformers for Video Action Recognition
Ganesh Samarth
Sheetal Ojha
Nikhil Pareek
ViT
63
1
0
29 Sep 2020
Distillation of Weighted Automata from Recurrent Neural Networks using a
  Spectral Approach
Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach
Rémi Eyraud
Stéphane Ayache
63
16
0
28 Sep 2020
Interventional Few-Shot Learning
Interventional Few-Shot Learning
Zhongqi Yue
Hanwang Zhang
Qianru Sun
Xiansheng Hua
120
235
0
28 Sep 2020
Causal Intervention for Weakly-Supervised Semantic Segmentation
Causal Intervention for Weakly-Supervised Semantic Segmentation
Dong Zhang
Hanwang Zhang
Jinhui Tang
Xiansheng Hua
Qianru Sun
CMLISeg
134
455
0
26 Sep 2020
Tied Block Convolution: Leaner and Better CNNs with Shared Thinner
  Filters
Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters
Xudong Wang
Stella X. Yu
62
38
0
25 Sep 2020
An embedded deep learning system for augmented reality in firefighting
  applications
An embedded deep learning system for augmented reality in firefighting applications
Manish Bhattarai
Aura Rose Jensen-Curtis
Manel Martínez-Ramón
40
29
0
22 Sep 2020
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image
  Classification and Retrieval
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
Andrés Mafla
S. Dey
Ali Furkan Biten
Lluís Gómez
Dimosthenis Karatzas
80
25
0
21 Sep 2020
Reinforcement Learning Approaches in Social Robotics
Reinforcement Learning Approaches in Social Robotics
Neziha Akalin
Amy Loutfi
OffRL
98
105
0
21 Sep 2020
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning
Tsu-Jui Fu
Xinze Wang
Scott T. Grafton
Miguel P. Eckstein
William Yang Wang
100
41
0
21 Sep 2020
An Interpretable and Uncertainty Aware Multi-Task Framework for
  Multi-Aspect Sentiment Analysis
An Interpretable and Uncertainty Aware Multi-Task Framework for Multi-Aspect Sentiment Analysis
Tian Shi
Ping Wang
Chandan K. Reddy
40
0
0
18 Sep 2020
Image Captioning with Attention for Smart Local Tourism using
  EfficientNet
Image Captioning with Attention for Smart Local Tourism using EfficientNet
D. H. Fudholi
Yurio Windiatmoko
Nurdi Afrianto
Prastyo Eko Susanto
Magfirah Suyuti
A. Hidayatullah
R. Rahmadi
3DH
33
11
0
18 Sep 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary
Thierry Deruyttere
Simon Vandenhende
Dusan Grujicic
Yu Liu
Luc Van Gool
Matthew Blaschko
Tinne Tuytelaars
Marie-Francine Moens
70
6
0
18 Sep 2020
Previous
123...272829...697071
Next