Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,520 papers shown
Title
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking
Xu Yuan
Chengjun Xu
Qiwei Chen
Tao Zhuang
Hongjie Chen
Chong Li
Junfeng Ge
AI4TS
64
0
0
19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective
Zheng Ma
Shi Zong
Mianzhi Pan
Jianbing Zhang
Shujian Huang
Xinyu Dai
Jiajun Chen
61
4
0
18 Oct 2022
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss
Tingyu Qu
Tinne Tuytelaars
Marie-Francine Moens
CVBM
54
4
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
240
3,522
0
16 Oct 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning
Tiantian He
Haicang Zhou
Yew-Soon Ong
Gao Cong
GNN
135
4
0
14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets
Anurag Roy
David Johnson Ekka
Saptarshi Ghosh
Abir Das
62
1
0
13 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning
Fuying Wang
Yuyin Zhou
Shujun Wang
V. Vardhanabhuti
Lequan Yu
117
149
0
12 Oct 2022
APSNet: Attention Based Point Cloud Sampling
Yang Ye
Xiulong Yang
Shihao Ji
3DPC
63
7
0
11 Oct 2022
Like a bilingual baby: The advantage of visually grounding a bilingual language model
Khai-Nguyen Nguyen
Zixin Tang
A. Mali
Mary Alexandria Kelly
VLM
45
0
0
11 Oct 2022
Generating image captions with external encyclopedic knowledge
S. Nikiforova
Tejaswini Deoskar
Denis Paperno
Yoad Winter
72
2
0
10 Oct 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning
Shi-You Xu
VLM
DiffM
90
14
0
10 Oct 2022
Fine-grained Anomaly Detection in Sequential Data via Counterfactual Explanations
He Cheng
Depeng Xu
Shuhan Yuan
Xintao Wu
AI4TS
59
3
0
09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling
Hsin-Ying Lee
Hung-Ting Su
Bing-Chen Tsai
Tsung-Han Wu
Jia-Fong Yeh
Winston H. Hsu
95
2
0
08 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds
Yufeng Zhong
Longdao Xu
Jiebo Luo
Lin Ma
89
15
0
08 Oct 2022
LOCL: Learning Object-Attribute Composition using Localization
Satish Kumar
A S M Iftekhar
Ekta Prashnani
B.S.Manjunath
96
3
0
07 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors
Federico Baldassarre
Quentin Debard
Gonzalo Fiz Pontiveros
Tri Kurniawan Wijaya
82
4
0
07 Oct 2022
CLEAR: Causal Explanations from Attention in Neural Recommenders
Shami Nisimov
R. Y. Rohekar
Yaniv Gurwicz
G. Koren
Gal Novik
CML
38
6
0
07 Oct 2022
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation
Khoa T. Vo
Sang Truong
Kashu Yamazaki
Bhiksha Raj
Minh-Triet Tran
Ngan Le
158
30
0
05 Oct 2022
Improved Anomaly Detection by Using the Attention-Based Isolation Forest
Lev V. Utkin
A. Ageev
A. Konstantinov
82
8
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
119
19
0
05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data
Panos Achlioptas
M. Ovsjanikov
Leonidas Guibas
Sergey Tulyakov
109
12
0
04 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
87
10
0
04 Oct 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings
Zhihuan Kuang
Shi Zong
Jianbing Zhang
Jiajun Chen
Hongfu Liu
71
5
0
02 Oct 2022
MaskTune: Mitigating Spurious Correlations by Forcing to Explore
Saeid Asgari Taghanaki
Aliasghar Khani
Fereshte Khani
A. Gholami
Linh-Tam Tran
Ali Mahdavi-Amiri
Ghassan Hamarneh
AAML
100
48
0
30 Sep 2022
Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning
Thi-Thu-Hong Phan
Duc Le
Brijesh Patel
Donald Adjeroh
Jingxian Wu
M. Jensen
Ngan Le
77
12
0
30 Sep 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
R. Ramos
Bruno Martins
Desmond Elliott
Yova Kementchedjhieva
VLM
92
89
0
30 Sep 2022
Medical Image Captioning via Generative Pretrained Transformers
Alexander Selivanov
Oleg Y. Rogov
Daniil Chesakov
Artem Shelmanov
Irina Fedulova
Dmitry V. Dylov
MedIm
102
64
0
28 Sep 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
60
2
0
28 Sep 2022
RepsNet: Combining Vision with Language for Automated Medical Reports
A. Tanwani
Joelle Barral
Daniel Freedman
MedIm
93
23
0
27 Sep 2022
STING: Self-attention based Time-series Imputation Networks using GAN
Eunkyu Oh
Taehun Kim
Yunhu Ji
Sushil Khyalia
AI4TS
92
25
0
22 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving
Srikanth Malla
Chiho Choi
Isht Dwivedi
Joonhyang Choi
Jiachen Li
183
100
0
22 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia
K. Nguyen
Ali Furkan Biten
Andrés Mafla
Lluís Gómez
Dimosthenis Karatzas
70
11
0
21 Sep 2022
Active Particle Filter Networks: Efficient Active Localization in Continuous Action Spaces and Large Maps
Daniel Honerkamp
Suresh Guttikonda
Abhinav Valada
71
2
0
20 Sep 2022
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud
Geraldo F. Oliveira
Juan Gómez Luna
Saugata Ghose
Amirali Boroumand
O. Mutlu
75
26
0
19 Sep 2022
Learning Distinct and Representative Styles for Image Captioning
Qi Chen
Chaorui Deng
Qi Wu
VLM
79
24
0
17 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information
Ahmed Sabir
Francesc Moreno-Noguer
Pranava Madhyastha
Lluís Padró
BDL
74
2
0
16 Sep 2022
M^4I: Multi-modal Models Membership Inference
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
99
27
0
15 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
62
3
0
13 Sep 2022
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Ajmal Mian
ViT
89
45
0
13 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a natural language
Amer Farea
Zhen Yang
Kien Duong
Nadeesha Perera
F. Emmert-Streib
ELM
62
3
0
10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
114
90
0
07 Sep 2022
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth
Anu Jagannath
Zackary Kane
Jithin Jagannath
76
11
0
07 Sep 2022
Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark
E Venkatesh
Yelleti Vivek
V. Ravi
Shiva Shankar Orsu
63
6
0
07 Sep 2022
A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels
Runmin Cong
Qi Qin
Chen Zhang
Qiuping Jiang
Shi Wang
Yao-Min Zhao
Sam Kwong
122
54
0
07 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation
Peining Zhang
Junliang Guo
Linli Xu
Mu You
Junming Yin
55
0
0
05 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning
Shangfei Zheng
Weiqing Wang
Jianfeng Qu
Hongzhi Yin
Wei Chen
Lei Zhao
LRM
82
24
0
03 Sep 2022
vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM
THANH VAN NGUYEN
Long H. Nguyen
Nhat Truong Pham
Liu Tai Nguyen
Van Huong Do
Hai Nguyen
Ngoc Duy Nguyen
VLM
ViT
50
1
0
03 Sep 2022
EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning
R. Gupta
Shivani Nandgaonkar
Nikhil Cherian Kurian
S. Rane
A. Sethi
MedIm
56
8
0
26 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window
Mocho Go
Hideyuki Tachibana
ViT
68
9
0
24 Aug 2022
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping
Bo Zhou
Jiahui Liu
Songyi Cui
Yaping Zhao
45
5
0
23 Aug 2022
Previous
1
2
3
...
11
12
13
...
69
70
71
Next