v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown

Title
Hierarchical Multi-Interest Co-Network For Coarse-Grained Ranking Xu Yuan Chengjun Xu Qiwei Chen Tao Zhuang Hongjie Chen Chong Li Junfeng Ge AI4TS 64 0 0 19 Oct 2022
Probing Cross-modal Semantics Alignment Capability from the Textual Perspective Zheng Ma Shi Zong Mianzhi Pan Jianbing Zhang Shujian Huang Xinyu Dai Jiajun Chen 61 4 0 18 Oct 2022
Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss Tingyu Qu Tinne Tuytelaars Marie-Francine Moens CVBM 54 4 0 17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models Christoph Schuhmann Romain Beaumont Richard Vencu Cade Gordon Ross Wightman ... Srivatsa Kundurthy Katherine Crowson Ludwig Schmidt R. Kaczmarczyk J. Jitsev VLM MLLM CLIP 240 3,522 0 16 Oct 2022
Not All Neighbors Are Worth Attending to: Graph Selective Attention Networks for Semi-supervised Learning Tiantian He Haicang Zhou Yew-Soon Ong Gao Cong GNN 135 4 0 14 Oct 2022
Few-Shot Visual Question Generation: A Novel Task and Benchmark Datasets Anurag Roy David Johnson Ekka Saptarshi Ghosh Abir Das 62 1 0 13 Oct 2022
Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning Fuying Wang Yuyin Zhou Shujun Wang V. Vardhanabhuti Lequan Yu 117 149 0 12 Oct 2022
APSNet: Attention Based Point Cloud Sampling Yang Ye Xiulong Yang Shihao Ji 3DPC 63 7 0 11 Oct 2022
Like a bilingual baby: The advantage of visually grounding a bilingual language model Khai-Nguyen Nguyen Zixin Tang A. Mali Mary Alexandria Kelly VLM 45 0 0 11 Oct 2022
Generating image captions with external encyclopedic knowledge S. Nikiforova Tejaswini Deoskar Denis Paperno Yoad Winter 72 2 0 10 Oct 2022
CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning Shi-You Xu VLM DiffM 90 14 0 10 Oct 2022
Fine-grained Anomaly Detection in Sequential Data via Counterfactual Explanations He Cheng Depeng Xu Shuhan Yuan Xintao Wu AI4TS 59 3 0 09 Oct 2022
Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling Hsin-Ying Lee Hung-Ting Su Bing-Chen Tsai Tsung-Han Wu Jia-Fong Yeh Winston H. Hsu 95 2 0 08 Oct 2022
Contextual Modeling for 3D Dense Captioning on Point Clouds Yufeng Zhong Longdao Xu Jiebo Luo Lin Ma 89 15 0 08 Oct 2022
LOCL: Learning Object-Attribute Composition using Localization Satish Kumar A S M Iftekhar Ekta Prashnani B.S.Manjunath 96 3 0 07 Oct 2022
Quantitative Metrics for Evaluating Explanations of Video DeepFake Detectors Federico Baldassarre Quentin Debard Gonzalo Fiz Pontiveros Tri Kurniawan Wijaya 82 4 0 07 Oct 2022
CLEAR: Causal Explanations from Attention in Neural Recommenders Shami Nisimov R. Y. Rohekar Yaniv Gurwicz G. Koren Gal Novik CML 38 6 0 07 Oct 2022
AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation Khoa T. Vo Sang Truong Kashu Yamazaki Bhiksha Raj Minh-Triet Tran Ngan Le 158 30 0 05 Oct 2022
Improved Anomaly Detection by Using the Attention-Based Isolation Forest Lev V. Utkin A. Ageev A. Konstantinov 82 8 0 05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data Ye Zhu Yuehua Wu N. Sebe Yan Yan 119 19 0 05 Oct 2022
Affection: Learning Affective Explanations for Real-World Visual Data Panos Achlioptas M. Ovsjanikov Leonidas Guibas Sergey Tulyakov 109 12 0 04 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning Xu Yang Hanwang Zhang Chongyang Gao Jianfei Cai MLLM 87 10 0 04 Oct 2022
Music-to-Text Synaesthesia: Generating Descriptive Text from Music Recordings Zhihuan Kuang Shi Zong Jianbing Zhang Jiajun Chen Hongfu Liu 71 5 0 02 Oct 2022
MaskTune: Mitigating Spurious Correlations by Forcing to Explore Saeid Asgari Taghanaki Aliasghar Khani Fereshte Khani A. Gholami Linh-Tam Tran Ali Mahdavi-Amiri Ghassan Hamarneh AAML 100 48 0 30 Sep 2022
Multimodality Multi-Lead ECG Arrhythmia Classification using Self-Supervised Learning Thi-Thu-Hong Phan Duc Le Brijesh Patel Donald Adjeroh Jingxian Wu M. Jensen Ngan Le 77 12 0 30 Sep 2022
SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation R. Ramos Bruno Martins Desmond Elliott Yova Kementchedjhieva VLM 92 89 0 30 Sep 2022
Medical Image Captioning via Generative Pretrained Transformers Alexander Selivanov Oleg Y. Rogov Daniil Chesakov Artem Shelmanov Irina Fedulova Dmitry V. Dylov MedIm 102 64 0 28 Sep 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference Mu Yuan Lan Zhang Fengxiang He Xueting Tong Miao-Hui Song Zhengyuan Xu Xiang-Yang Li 60 2 0 28 Sep 2022
RepsNet: Combining Vision with Language for Automated Medical Reports A. Tanwani Joelle Barral Daniel Freedman MedIm 93 23 0 27 Sep 2022
STING: Self-attention based Time-series Imputation Networks using GAN Eunkyu Oh Taehun Kim Yunhu Ji Sushil Khyalia AI4TS 92 25 0 22 Sep 2022
DRAMA: Joint Risk Localization and Captioning in Driving Srikanth Malla Chiho Choi Isht Dwivedi Joonhyang Choi Jiachen Li 183 100 0 22 Sep 2022
Show, Interpret and Tell: Entity-aware Contextualised Image Captioning in Wikipedia K. Nguyen Ali Furkan Biten Andrés Mafla Lluís Gómez Dimosthenis Karatzas 70 11 0 21 Sep 2022
Active Particle Filter Networks: Efficient Active Localization in Continuous Action Spaces and Large Maps Daniel Honerkamp Suresh Guttikonda Abhinav Valada 71 2 0 20 Sep 2022
Accelerating Neural Network Inference with Processing-in-DRAM: From the Edge to the Cloud Geraldo F. Oliveira Juan Gómez Luna Saugata Ghose Amirali Boroumand O. Mutlu 75 26 0 19 Sep 2022
Learning Distinct and Representative Styles for Image Captioning Qi Chen Chaorui Deng Qi Wu VLM 79 24 0 17 Sep 2022
Belief Revision based Caption Re-ranker with Visual Semantic Information Ahmed Sabir Francesc Moreno-Noguer Pranava Madhyastha Lluís Padró BDL 74 2 0 16 Sep 2022
M^4I: Multi-modal Models Membership Inference Pingyi Hu Zihan Wang Ruoxi Sun Hu Wang Minhui Xue 99 27 0 15 Sep 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition Kartik Audhkhasi Yinghui Huang Bhuvana Ramabhadran Pedro J. Moreno 62 3 0 13 Sep 2022
Vision Transformers for Action Recognition: A Survey Anwaar Ulhaq Naveed Akhtar Ganna Pogrebna Ajmal Mian ViT 89 45 0 13 Sep 2022
Evaluation of Question Answering Systems: Complexity of judging a natural language Amer Farea Zhen Yang Kien Duong Nadeesha Perera F. Emmert-Streib ELM 62 3 0 10 Sep 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions Paul Pu Liang Amir Zadeh Louis-Philippe Morency 114 90 0 07 Sep 2022
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth Anu Jagannath Zackary Kane Jithin Jagannath 76 11 0 07 Sep 2022
Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark E Venkatesh Yelleti Vivek V. Ravi Shiva Shankar Orsu 63 6 0 07 Sep 2022
A Weakly Supervised Learning Framework for Salient Object Detection via Hybrid Labels Runmin Cong Qi Qin Chen Zhang Qiuping Jiang Shi Wang Yao-Min Zhao Sam Kwong 122 54 0 07 Sep 2022
Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation Peining Zhang Junliang Guo Linli Xu Mu You Junming Yin 55 0 0 05 Sep 2022
MMKGR: Multi-hop Multi-modal Knowledge Graph Reasoning Shangfei Zheng Weiqing Wang Jianfeng Qu Hongzhi Yin Wei Chen Lei Zhao LRM 82 24 0 03 Sep 2022
vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM THANH VAN NGUYEN Long H. Nguyen Nhat Truong Pham Liu Tai Nguyen Van Huong Do Hai Nguyen Ngoc Duy Nguyen VLM ViT 50 1 0 03 Sep 2022
EGFR Mutation Prediction of Lung Biopsy Images using Deep Learning R. Gupta Shivani Nandgaonkar Nikhil Cherian Kurian S. Rane A. Sethi MedIm 56 8 0 26 Aug 2022
gSwin: Gated MLP Vision Model with Hierarchical Structure of Shifted Window Mocho Go Hideyuki Tachibana ViT 68 9 0 24 Aug 2022
Large-Scale Traffic Congestion Prediction based on Multimodal Fusion and Representation Mapping Bo Zhou Jiahui Liu Songyi Cui Yaping Zhao 45 5 0 23 Aug 2022