v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown

Title
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information Zhongjie Ye Helin Wang Dongchao Yang Yuexian Zou 106 28 0 12 Oct 2021
Multi-Modal Interaction Graph Convolutional Network for Temporal Language Localization in Videos Zongmeng Zhang Xianjing Han Xuemeng Song Yan Yan Liqiang Nie 120 37 0 12 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption Wenbin Wang R. Wang X. Chen DiffM 94 14 0 12 Oct 2021
Reason induced visual attention for explainable autonomous driving Sikai Chen Jiqian Dong Runjia Du Yujie Li Samuel Labi 68 1 0 11 Oct 2021
Semi-Autoregressive Image Captioning Xu Yan Zhengcong Fei Zekang Li Shuhui Wang Qingming Huang Qi Tian 91 25 0 11 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm Yangguang Li Feng Liang Lichen Zhao Yufeng Cui Wanli Ouyang Jing Shao F. Yu Junjie Yan VLM CLIP 167 458 0 11 Oct 2021
Recurrent Attention Models with Object-centric Capsule Representation for Multi-object Recognition Hossein Adeli Seoyoung Ahn G. Zelinsky OCL 58 3 0 11 Oct 2021
Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content Alan Lundgard Arvind Satyanarayan 59 136 0 08 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models J. Tan C. Chan Joon Huang Chuah VLM 132 16 0 07 Oct 2021
Attentive Walk-Aggregating Graph Neural Networks M. F. Demirel Shengchao Liu Siddhant Garg Zhenmei Shi Yingyu Liang 133 10 0 06 Oct 2021
Let there be a clock on the beach: Reducing Object Hallucination in Image Captioning Ali Furkan Biten L. G. I. Bigorda Dimosthenis Karatzas 168 63 0 04 Oct 2021
Trustworthy AI: From Principles to Practices Yue Liu Peng Qi Bo Liu Shuai Di Jingen Liu Jiquan Pei Jinfeng Yi Bowen Zhou 213 384 0 04 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images Zhuowan Li Elias Stengel-Eskin Yixiao Zhang Cihang Xie Q. Tran Benjamin Van Durme Alan Yuille VLM 73 15 0 01 Oct 2021
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning Chi-Yin Wang Yulin Shen Luping Ji ViT 113 53 0 01 Oct 2021
Multi-granular Legal Topic Classification on Greek Legislation C. Papaloukas Ilias Chalkidis Konstantinos Athinaios D. Pantazi Manolis Koubarakis AILaw 80 25 0 30 Sep 2021
Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks Amirali Boroumand Saugata Ghose Berkin Akin Ravi Narayanaswami Geraldo F. Oliveira Xiaoyu Ma Eric Shiu O. Mutlu 80 86 0 29 Sep 2021
Geometry-Entangled Visual Semantic Transformer for Image Captioning Ling Cheng Wei Wei Feida Zhu Yong Liu Chunyan Miao ViT 47 3 0 29 Sep 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering Ekta Sood Fabian Kögel Florian Strohm Prajit Dhar Andreas Bulling 67 19 0 27 Sep 2021
Optimising for Interpretability: Convolutional Dynamic Alignment Networks Moritz D Boehle Mario Fritz Bernt Schiele 39 3 0 27 Sep 2021
The JDDC 2.0 Corpus: A Large-Scale Multimodal Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service Nan Zhao Haoran Li Youzheng Wu Xiaodong He Bowen Zhou 50 9 0 27 Sep 2021
Weakly Supervised Contrastive Learning for Chest X-Ray Report Generation An Yan Zexue He Xing Lu Jingfeng Du E. Chang Amilcare Gentili Julian McAuley Chun-Nan Hsu MedIm 183 65 0 25 Sep 2021
Scene Graph Generation for Better Image Captioning? Maximilian Mozes Martin Schmitt Vladimir Golkov Hinrich Schütze Zorah Lähner GNN 76 3 0 23 Sep 2021
Cross-Modal Coherence for Text-to-Image Retrieval Malihe Alikhani Fangda Han Hareesh Ravi Mubbasir Kapadia Vladimir Pavlovic Matthew Stone 72 9 0 22 Sep 2021
Pix2seq: A Language Modeling Framework for Object Detection Ting-Li Chen Saurabh Saxena Lala Li David J. Fleet Geoffrey E. Hinton MLLM ViT VLM 307 351 0 22 Sep 2021
Caption Enriched Samples for Improving Hateful Memes Detection Efrat Blaier Itzik Malkiel Lior Wolf VLM 96 24 0 22 Sep 2021
Latexify Math: Mathematical Formula Markup Revision to Assist Collaborative Editing in Math Q&A Sites Suyu Ma Chunyang Chen Hourieh Khalajzadeh J. Grundy HAI AIMat 36 5 0 20 Sep 2021
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation Feilong Chen Fandong Meng Xiuyi Chen Peng Li Jie Zhou 102 23 0 17 Sep 2021
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog Feilong Chen Xiuyi Chen Fandong Meng Peng Li Jie Zhou 145 35 0 17 Sep 2021
Cross Modification Attention Based Deliberation Model for Image Captioning Zheng Lian Yanan Zhang Haichang Li Rui Wang Xiaohui Hu 69 5 0 17 Sep 2021
Label-Attention Transformer with Geometrically Coherent Objects for Image Captioning Shikha Dubey Farrukh Olimov M. Rafique Joonmo Kim M. Jeon ViT 84 43 0 16 Sep 2021
SafeAccess+: An Intelligent System to make Smart Home Safer and Americans with Disability Act Compliant Shahinur Alam 40 2 0 14 Sep 2021
DAFNe: A One-Stage Anchor-Free Approach for Oriented Object Detection Steven Lang Fabrizio G. Ventola Kristian Kersting 88 15 0 13 Sep 2021
Learning to Ground Visual Objects for Visual Dialog Feilong Chen Xiuyi Chen Can Xu Daxin Jiang OOD 94 18 0 13 Sep 2021
Explain Me the Painting: Multi-Topic Knowledgeable Art Description Generation Zechen Bai Yuta Nakashima Noa Garcia 118 44 0 13 Sep 2021
Bornon: Bengali Image Captioning with Transformer-based Deep learning approach Faisal Muhammad Shah Mayeesha Humaira Md Abidur Rahman Khan Jim Amit Saha Ami Shimul Paul 55 19 0 11 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics Simon Dobnik R. Cooper Adam Ek Bill Noble Staffan Larsson N. Ilinykh Vladislav Maraev Vidya Somashekarappa 66 0 0 10 Sep 2021
Is Attention Better Than Matrix Decomposition? Zhengyang Geng Meng-Hao Guo Hongxu Chen Xia Li Ke Wei Zhouchen Lin 125 142 0 09 Sep 2021
Dynamic Modeling of Hand-Object Interactions via Tactile Sensing Qiang Zhang Yunzhu Li Yiyue Luo Wan Shou Michael Foshey Junchi Yan J. Tenenbaum Wojciech Matusik Antonio Torralba 69 18 0 09 Sep 2021
Sensor-Augmented Egocentric-Video Captioning with Dynamic Modal Attention Katsuyuki Nakamura Hiroki Ohashi Mitsuhiro Okada EgoV 94 13 0 07 Sep 2021
Journalistic Guidelines Aware News Image Captioning Xuewen Yang Svebor Karaman Joel R. Tetreault Alex Jaimes 90 27 0 07 Sep 2021
Ultra-high Resolution Image Segmentation via Locality-aware Context Fusion and Alternating Local Enhancement Wenxi Liu Qi Li Xin Lin Weixiang Yang Shengfeng He Yuanlong Yu 78 8 0 06 Sep 2021
LAViTeR: Learning Aligned Visual and Textual Representations Assisted by Image and Caption Generation Mohammad Abuzar Shaikh Zhanghexuan Ji Dana Moukheiber Yan Shen S. Srihari Mingchen Gao VLM 65 1 0 04 Sep 2021
Attentive Neural Controlled Differential Equations for Time-series Classification and Forecasting Sheo Yon Jhin H. Shin Seoyoung Hong Solhee Park Noseong Park AI4TS 66 24 0 04 Sep 2021
IMG2SMI: Translating Molecular Structure Images to Simplified Molecular-input Line-entry System Daniel Fernando Campos Heng Ji 66 12 0 03 Sep 2021
Sequence-to-Sequence Learning with Latent Neural Grammars Yoon Kim 168 40 0 02 Sep 2021
Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond Amir Feder Katherine A. Keith Emaad A. Manzoor Reid Pryzant Dhanya Sridhar ... Roi Reichart Margaret E. Roberts Brandon M Stewart Victor Veitch Diyi Yang CML 123 246 0 02 Sep 2021
Working Memory Connections for LSTM Federico Landi Lorenzo Baraldi Marcella Cornia Rita Cucchiara KELM 74 173 0 31 Aug 2021
$Automated Generation of Accurate \& Fluent Medical X-ray Reports$ Automated Generation of Accurate \& Fluent Medical X-ray Reports Hoang T.N. Nguyen Dong Nie Taivanbat Badamdorj Yujie Liu Yingying Zhu J. Truong Li Cheng MedIm LM&MA 73 40 0 27 Aug 2021
Similar Scenes arouse Similar Emotions: Parallel Data Augmentation for Stylized Image Captioning Guodun Li Yuchen Zhai Zehao Lin Yin Zhang 114 21 0 26 Aug 2021
Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration Soroush Seifi Abhishek Jha Tinne Tuytelaars 44 10 0 26 Aug 2021