v1v2v3 (latest)

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

10 February 2015

Jimmy Ba

Aaron Courville

Papers citing "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"

50 / 3,520 papers shown

Title
Bayesian Attention Modules Xinjie Fan Shujian Zhang Bo Chen Mingyuan Zhou 183 62 0 20 Oct 2020
Interpreting convolutional networks trained on textual data Reza Marzban Christopher Crick FAtt 39 3 0 20 Oct 2020
A Survey on Deep Learning and Explainability for Automatic Report Generation from Medical Images Pablo Messina Pablo Pino Denis Parra Alvaro Soto Cecilia Besa S. Uribe Marcelo andía C. Tejos Claudia Prieto Daniel Capurro MedIm 135 65 0 20 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues Hung Le Doyen Sahoo Nancy F. Chen Guosheng Lin 117 31 0 20 Oct 2020
Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation Yasuhide Miura Yuhao Zhang Emily Bao Tsai C. Langlotz Dan Jurafsky MedIm 250 159 0 20 Oct 2020
Learning to Reconstruct and Segment 3D Objects Bo Yang 3DPC 65 1 0 19 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends Shagun Uppal Sarthak Bhagat Devamanyu Hazarika Navonil Majumdar Soujanya Poria Roger Zimmermann Amir Zadeh 101 6 0 19 Oct 2020
Image Captioning with Visual Object Representations Grounded in the Textual Modality Duvsan Varivs Katsuhito Sudoh Satoshi Nakamura 39 1 0 19 Oct 2020
Language and Visual Entity Relationship Graph for Agent Navigation Yicong Hong Cristian Rodriguez-Opazo Yuankai Qi Qi Wu Stephen Gould LM&Ro 229 135 0 19 Oct 2020
TextMage: The Automated Bangla Caption Generator Based On Deep Learning Abrar Hasin Kamal Md Asifuzzaman Jishan N. Mansoor VLM 46 21 0 15 Oct 2020
Improving Natural Language Processing Tasks with Human Gaze-Guided Neural Attention Ekta Sood Simon Tannert Philipp Mueller Andreas Bulling 94 74 0 15 Oct 2020
Interpreting Deep Learning Model Using Rule-based Method Xiaojian Wang Jingyuan Wang Ke Tang 23 3 0 15 Oct 2020
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension Ekta Sood Simon Tannert Diego Frassinelli Andreas Bulling Ngoc Thang Vu HAI 75 57 0 13 Oct 2020
MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding Qinxin Wang Hao Tan Sheng Shen Michael W. Mahoney Z. Yao ObjD 154 11 0 12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering Giannis Daras Nikita Kitaev Augustus Odena A. Dimakis 106 46 0 11 Oct 2020
Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification Yulin Wang Kangchen Lv Rui Huang Shiji Song Le Yang Gao Huang 3DH 65 151 0 11 Oct 2020
Boosted EfficientNet: Detection of Lymph Node Metastases in Breast Cancer Using Convolutional Neural Network Jun Wang Qianying Liu Haotian Xie Zhaogang Yang Hefeng Zhou MedIm 70 79 0 10 Oct 2020
Beyond Language: Learning Commonsense from Images for Reasoning Wanqing Cui Yanyan Lan Liang Pang Jiafeng Guo Xueqi Cheng LRM 71 5 0 10 Oct 2020
Widget Captioning: Generating Natural Language Description for Mobile User Interface Elements Yongqian Li Gang Li Luheng He Jingjie Zheng Hong Li Zhiwei Guan 71 110 0 08 Oct 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks Dong-Jin Kim Tae-Hyun Oh Jinsoo Choi In So Kweon 115 27 0 08 Oct 2020
Visual News: Benchmark and Challenges in News Image Captioning Fuxiao Liu Yinghan Wang Tianlu Wang Vicente Ordonez VLM 88 116 0 08 Oct 2020
Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations Wanrong Zhu Xinze Wang P. Narayana Kazoo Sone Sugato Basu William Yang Wang 44 8 0 07 Oct 2020
Narrative Text Generation with a Latent Discrete Plan Harsh Jhamtani Taylor Berg-Kirkpatrick 53 17 0 07 Oct 2020
Learning to Represent Image and Text with Denotation Graph Bowen Zhang Hexiang Hu Vihan Jain Eugene Ie Fei Sha 78 22 0 06 Oct 2020
Visualizing Color-wise Saliency of Black-Box Image Classification Models Yuhki Hatakeyama Hiroki Sakuma Yoshinori Konishi Kohei Suenaga FAtt 72 3 0 06 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott 78 11 0 05 Oct 2020
A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning Ruchika Chavhan Biplab Banerjee Xiaoxiang Zhu S. Chaudhuri 32 8 0 05 Oct 2020
AFN: Attentional Feedback Network based 3D Terrain Super-Resolution A. Kubade D. Patel Avinash Sharma K. Rajan SupR 48 10 0 04 Oct 2020
Explaining Deep Neural Networks Oana-Maria Camburu XAI FAtt 110 26 0 04 Oct 2020
Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation Patrick Dendorfer Aljosa Osep Laura Leal-Taixé 100 110 0 02 Oct 2020
Multi-Modal Open-Domain Dialogue Kurt Shuster Eric Michael Smith Da Ju Jason Weston AI4CE 141 44 0 02 Oct 2020
MGD-GAN: Text-to-Pedestrian generation through Multi-Grained Discrimination Shengyu Zhang Donghui Wang Zhou Zhao Siliang Tang Di Xie Leilei Gan 32 0 0 02 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text Yuhao Zhang Hang Jiang Yasuhide Miura Christopher D. Manning C. Langlotz MedIm 238 774 0 02 Oct 2020
MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention Carson Eisenach Yagna Patel Dhruv Madeka AI4TS 107 37 0 30 Sep 2020
Teacher-Critical Training Strategies for Image Captioning Yiqing Huang Jiansheng Chen VLM 63 9 0 30 Sep 2020
Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning Xiangxi Shi Xu Yang Jiuxiang Gu Shafiq Joty Jianfei Cai 71 53 0 30 Sep 2020
Attention-Driven Body Pose Encoding for Human Activity Recognition B Debnath Mary O'Brien Swagat Kumar Ardhendu Behera 3DH CVBM 90 5 0 29 Sep 2020
Spatial Attention as an Interface for Image Captioning Models P. Sadler 60 0 0 29 Sep 2020
Knowledge Fusion Transformers for Video Action Recognition Ganesh Samarth Sheetal Ojha Nikhil Pareek ViT 63 1 0 29 Sep 2020
Distillation of Weighted Automata from Recurrent Neural Networks using a Spectral Approach Rémi Eyraud Stéphane Ayache 63 16 0 28 Sep 2020
Interventional Few-Shot Learning Zhongqi Yue Hanwang Zhang Qianru Sun Xiansheng Hua 120 235 0 28 Sep 2020
Causal Intervention for Weakly-Supervised Semantic Segmentation Dong Zhang Hanwang Zhang Jinhui Tang Xiansheng Hua Qianru Sun CML ISeg 134 455 0 26 Sep 2020
Tied Block Convolution: Leaner and Better CNNs with Shared Thinner Filters Xudong Wang Stella X. Yu 62 38 0 25 Sep 2020
An embedded deep learning system for augmented reality in firefighting applications Manish Bhattarai Aura Rose Jensen-Curtis Manel Martínez-Ramón 40 29 0 22 Sep 2020
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval Andrés Mafla S. Dey Ali Furkan Biten Lluís Gómez Dimosthenis Karatzas 80 25 0 21 Sep 2020
Reinforcement Learning Approaches in Social Robotics Neziha Akalin Amy Loutfi OffRL 98 105 0 21 Sep 2020
SSCR: Iterative Language-Based Image Editing via Self-Supervised Counterfactual Reasoning Tsu-Jui Fu Xinze Wang Scott T. Grafton Miguel P. Eckstein William Yang Wang 100 41 0 21 Sep 2020
An Interpretable and Uncertainty Aware Multi-Task Framework for Multi-Aspect Sentiment Analysis Tian Shi Ping Wang Chandan K. Reddy 40 0 0 18 Sep 2020
Image Captioning with Attention for Smart Local Tourism using EfficientNet D. H. Fudholi Yurio Windiatmoko Nurdi Afrianto Prastyo Eko Susanto Magfirah Suyuti A. Hidayatullah R. Rahmadi 3DH 33 11 0 18 Sep 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary Thierry Deruyttere Simon Vandenhende Dusan Grujicic Yu Liu Luc Van Gool Matthew Blaschko Tinne Tuytelaars Marie-Francine Moens 70 6 0 18 Sep 2020