v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown

Title
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model Ka Leong Cheng Wenpo Song Zheng Ma Wenhao Zhu Zi-Yue Zhu Jianbing Zhang CLIP VLM 65 11 0 02 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning Dustin Pulver Prithila Angkan Paul Hungler Ali Etemad 84 5 0 01 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning Junjie Fei Teng Wang Jinrui Zhang Zhenyu He Chengjie Wang Feng Zheng VLM 84 36 0 31 Jul 2023
Triple Correlations-Guided Label Supplementation for Unbiased Video Scene Graph Generation Wenqing Wang Kaifeng Gao Yawei Luo Tao Jiang Fei Gao Jian Shao Jianwen Sun Jun Xiao 106 3 0 30 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction Xiaowei Mao Haomin Wen Hengrui Zhang Huaiyu Wan Lixia Wu Jianbin Zheng Haoyuan Hu Youfang Lin AI4TS 159 14 0 30 Jul 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey Gabriele Lagani Fabrizio Falchi Claudio Gennaro Giuseppe Amato AAML 111 7 0 30 Jul 2023
RSGPT: A Remote Sensing Vision Language Model and Benchmark Yuan Hu Jianlong Yuan Congcong Wen Xiaonan Lu Xiang Li VLM 97 116 0 28 Jul 2023
Fact-Checking of AI-Generated Reports Razi Mahmood Diego Machado Reyes Ge Wang Mannudeep Kalra Pingkun Yan MedIm 73 6 0 27 Jul 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures Kun Yuan V. Srivastav Tong Yu Joël L. Lavanchy J. Marescaux Pietro Mascagni Nassir Navab N. Padoy 201 23 0 27 Jul 2023
On the Learning Dynamics of Attention Networks Rahul Vashisht H. G. Ramaswamy 52 1 0 25 Jul 2023
Enhancing image captioning with depth information using a Transformer-based framework Aya Mahmoud Ahmed Mohamed Yousef K. Hussain Yousef B. Mahdy ViT 71 4 0 24 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query Anindya Mondal Sauradip Nag J. Prada Xiatian Zhu Anjan Dutta 67 11 0 20 Jul 2023
Class Attention to Regions of Lesion for Imbalanced Medical Image Recognition Jia-Xin Zhuang Jiabin Cai Jianguo Zhang Wei-Shi Zheng Ruixuan Wang 48 11 0 19 Jul 2023
Embedded Heterogeneous Attention Transformer for Cross-lingual Image Captioning Zijie Song Zhenzhen Hu Yuanen Zhou Ye Zhao Richang Hong Meng Wang 64 3 0 19 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future Chaoyang Zhu Long Chen ObjD VLM 148 40 0 18 Jul 2023
Human Action Recognition in Still Images Using ConViT Seyed Rohollah Hosseyni Sanaz Seyedin Hasan Taheri ViT 50 0 0 18 Jul 2023
GenAssist: Making Image Generation Accessible Mina Huh Yi-Hao Peng Amy Pavel DiffM 64 34 0 14 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes Guoyun Tu Ying Liu Vladimir Vlassov 155 1 0 14 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training Yiren Jian Chongyang Gao Soroush Vosoughi VLM MLLM 106 31 0 13 Jul 2023
Is Task-Agnostic Explainable AI a Myth? Alicja Chaszczewicz 57 2 0 13 Jul 2023
Reading Radiology Imaging Like The Radiologist Yuhao Wang MedIm 86 0 0 12 Jul 2023
DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization Simin Chen Shiyi Wei Cong Liu Wei Yang 70 6 0 11 Jul 2023
Undecimated Wavelet Transform for Word Embedded Semantic Marginal Autoencoder in Security improvement and Denoising different Languages S. Shreyanth 30 0 0 06 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels Bang-ju Yang Fenglin Liu Zheng Li Qingyu Yin Chenyu You Bing Yin Yuexian Zou VLM 104 5 0 05 Jul 2023
Seeing in Words: Learning to Classify through Language Bottlenecks Khalid Saifullah Yuxin Wen Jonas Geiping Micah Goldblum Tom Goldstein VLM 53 2 0 29 Jun 2023
Variational latent discrete representation for time series modelling Max H. Cohen M. Charbit Sylvain Le Corff 125 1 0 27 Jun 2023
Self-Supervised Image Captioning with CLIP Chuanyang Jin VLM SSL 88 2 0 26 Jun 2023
Improving Reference-based Distinctive Image Captioning with Contrastive Rewards Yangjun Mao Jun Xiao Dong Zhang Meng Cao Jian Shao Yueting Zhuang Long Chen EGVM 76 9 0 25 Jun 2023
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation Zihao Yue Anwen Hu Liang Zhang Qin Jin 101 2 0 23 Jun 2023
Dense Video Object Captioning from Disjoint Supervision Xingyi Zhou Anurag Arnab Chen Sun Cordelia Schmid 105 3 0 20 Jun 2023
KiUT: Knowledge-injected U-Transformer for Radiology Report Generation Zhongzhen Huang Xiaofan Zhang Shaoting Zhang MedIm 95 52 0 20 Jun 2023
GraphGLOW: Universal and Generalizable Structure Learning for Graph Neural Networks Wentao Zhao Qitian Wu Chenxiao Yang Junchi Yan 72 14 0 20 Jun 2023
Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation Shuo Chen Yingjun Du Pascal Mettes Cees G. M. Snoek OffRL 135 4 0 16 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models Lingxi Xie Longhui Wei Xiaopeng Zhang Kaifeng Bi Xiaotao Gu Jianlong Chang Qi Tian 88 7 0 14 Jun 2023
Top-Down Framework for Weakly-supervised Grounded Image Captioning Chen Cai Suchen Wang Kim-Hui Yap Yi Wang ObjD 64 3 0 13 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions N. Rodis Christos Sardianos Panagiotis I. Radoglou-Grammatikis Panagiotis G. Sarigiannidis Iraklis Varlamis Georgios Th. Papadopoulos 111 24 0 09 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report Generation Bang-ju Yang Asif Raza Yuexian Zou Tong Zhang MedIm 87 11 0 09 Jun 2023
Object Detection with Transformers: A Review Tahira Shehzadi K. Hashmi D. Stricker Muhammad Zeshan Afzal ViT MU 104 29 0 07 Jun 2023
Towards Adaptable and Interactive Image Captioning with Data Augmentation and Episodic Memory Aliki Anagnostopoulou Mareike Hartmann Daniel Sonntag CLL VLM 65 0 0 06 Jun 2023
Putting Humans in the Image Captioning Loop Aliki Anagnostopoulou Mareike Hartmann Daniel Sonntag VLM 53 1 0 06 Jun 2023
On the Role of Attention in Prompt-tuning Samet Oymak A. S. Rawat Mahdi Soltanolkotabi Christos Thrampoulidis MLT LRM 88 47 0 06 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning Jianghui Wang Yuxuan Wang Dongyan Zhao Zilong Zheng 98 1 0 04 Jun 2023
Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models Hidetaka Kamigaito Katsuhiko Hayashi Taro Watanabe VLM 65 1 0 03 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work Qiangchang Wang Yilong Yin 102 0 0 02 Jun 2023
"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning Abisek Rajakumar Kalarani P. Bhattacharyya Niyati Chhaya Sumit Shekhar CoGe VLM 116 9 0 01 Jun 2023
Cross-Domain Car Detection Model with Integrated Convolutional Block Attention Mechanism Haoxuan Xu Songning Lai Xianyang Li Y. Yang ViT 79 15 0 31 May 2023
HGT: A Hierarchical GCN-Based Transformer for Multimodal Periprosthetic Joint Infection Diagnosis Using CT Images and Text Ruiyang Li Fujun Yang Xianjie Liu Hon-Yi Shi 75 0 0 29 May 2023
GBG++: A Fast and Stable Granular Ball Generation Method for Classification Qin Xie Qinghua Zhang Shuyin Xia Fan Zhao Chengying Wu Guoyin Wang Weiping Ding 83 18 0 29 May 2023
FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions Noam Rotstein David Bensaid Shaked Brody Roy Ganz Ron Kimmel VLM 83 31 0 28 May 2023
S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts Qi Chen Yutong Xie Biao Wu Minh-Son To James Ang Qi Wu 44 3 0 26 May 2023