Show and Tell: A Neural Image Caption Generator

17 November 2014

Papers citing "Show and Tell: A Neural Image Caption Generator"

50 / 2,022 papers shown

Title
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning Fan Wu Zhongwen Xu Yi Yang ObjD 34 11 0 22 Mar 2017
The Use of Autoencoders for Discovering Patient Phenotypes Harini Suresh Peter Szolovits Marzyeh Ghassemi DRL 16 28 0 20 Mar 2017
Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning Abhishek Das Satwik Kottur J. M. F. Moura Stefan Lee Dhruv Batra OffRL 31 423 0 20 Mar 2017
VQABQ: Visual Question Answering by Basic Questions Jia-Hong Huang Modar Alfadly Guohao Li 27 24 0 19 Mar 2017
Recurrent Models for Situation Recognition Arun Mallya Svetlana Lazebnik 20 30 0 18 Mar 2017
Towards Diverse and Natural Image Descriptions via a Conditional GAN Bo Dai Sanja Fidler R. Urtasun Dahua Lin GAN 22 450 0 17 Mar 2017
Learning Robust Visual-Semantic Embeddings Yao-Hung Hubert Tsai Liang-Kang Huang Ruslan Salakhutdinov SSL AI4TS 27 166 0 17 Mar 2017
Massive Exploration of Neural Machine Translation Architectures D. Britz Anna Goldie Minh-Thang Luong Quoc V. Le 29 516 0 11 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos De-An Huang Joseph J. Lim Li Fei-Fei Juan Carlos Niebles 24 56 0 07 Mar 2017
Neural Machine Translation and Sequence-to-sequence Models: A Tutorial Graham Neubig AIMat 37 171 0 05 Mar 2017
Machine Learning on Sequential Data Using a Recurrent Weighted Average Jared Ostmeyer L. Cowell 22 32 0 03 Mar 2017
Toward Controlled Generation of Text Zhiting Hu Zichao Yang Xiaodan Liang Ruslan Salakhutdinov Eric Xing 61 984 0 02 Mar 2017
Using Synthetic Data to Train Neural Networks is Model-Based Reasoning T. Le A. G. Baydin R. Zinkov Frank Wood SyDa OOD 25 89 0 02 Mar 2017
Evolving Deep Neural Networks Risto Miikkulainen J. Liang Elliot Meyerson Aditya Rawal Daniel Fink ... B. Raju H. Shahrzad Arshak Navruzyan Nigel P. Duffy B. Hodjat 21 884 0 01 Mar 2017
Asymmetric Tri-training for Unsupervised Domain Adaptation Kuniaki Saito Yoshitaka Ushiku Tatsuya Harada 49 582 0 27 Feb 2017
Person Search with Natural Language Description Shuang Li Tong Xiao Hongsheng Li Bolei Zhou Dayu Yue Xiaogang Wang 24 386 0 19 Feb 2017
MAT: A Multimodal Attentive Translator for Image Captioning Chang Liu F. Sun Changhu Wang Feng Wang Alan Yuille 20 58 0 18 Feb 2017
Dataset Augmentation in Feature Space Terrance Devries Graham W. Taylor 23 423 0 17 Feb 2017
End-to-End Interpretation of the French Street Name Signs Dataset Raymond W. Smith Chunhui Gu Dar-Shyang Lee Huiyi Hu Ranjith Unnikrishnan Julian Ibarz Sacha Arnoud Sophia Lin 11 42 0 13 Feb 2017
Parallel Long Short-Term Memory for Multi-stream Classification Mohamed Bouaziz Mohamed Morchid Richard Dufour G. Linarès R. Mori 12 11 0 11 Feb 2017
A Hybrid Convolutional Variational Autoencoder for Text Generation Stanislau Semeniuta Aliaksei Severyn Erhardt Barth 26 251 0 08 Feb 2017
Gated Multimodal Units for Information Fusion John Arevalo Thamar Solorio Manuel Montes-y-Gómez Fabio Gonzalez 33 371 0 07 Feb 2017
Toward Abstraction from Multi-modal Data: Empirical Studies on Multiple Time-scale Recurrent Models Junpei Zhong Angelo Cangelosi T. Ogata 14 14 0 07 Feb 2017
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation Iacer Calixto Qun Liu N. Campbell 40 179 0 04 Feb 2017
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video Esteban Real Jonathon Shlens S. Mazzocchi Xin Pan Vincent Vanhoucke VOS ObjD 40 534 0 02 Feb 2017
Symbolic, Distributed and Distributional Representations for Natural Language Processing in the Era of Deep Learning: a Survey L. Ferrone Fabio Massimo Zanzotto 39 37 0 02 Feb 2017
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation N. Mostafazadeh Chris Brockett W. Dolan Michel Galley Jianfeng Gao Georgios P. Spithourakis Lucy Vanderwende 21 181 0 28 Jan 2017
Learning Word-Like Units from Joint Audio-Visual Analysis David Harwath James R. Glass 32 106 0 25 Jan 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation Iacer Calixto Qun Liu Nick Campbell 32 154 0 23 Jan 2017
dna2vec: Consistent vector representations of variable-length k-mers Patrick Ng 32 173 0 23 Jan 2017
Comprehension-guided referring expressions Ruotian Luo Gregory Shakhnarovich ObjD 29 171 0 12 Jan 2017
Attention-Based Multimodal Fusion for Video Description Chiori Hori Takaaki Hori Teng-Yok Lee Kazuhiro Sumi J. Hershey Tim K. Marks 41 359 0 11 Jan 2017
Context-aware Captions from Context-agnostic Supervision Ramakrishna Vedantam Samy Bengio Kevin Patrick Murphy Devi Parikh Gal Chechik 22 152 0 11 Jan 2017
Learning From Noisy Large-Scale Datasets With Minimal Supervision Andreas Veit N. Alldrin Gal Chechik Ivan Krasin Abhinav Gupta Serge J. Belongie 34 476 0 06 Jan 2017
End-to-End Attention based Text-Dependent Speaker Verification Shi-Xiong Zhang Zhuo Chen Yong Zhao Jinyu Li Jiawei Liu 18 177 0 03 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions Licheng Yu Hao Tan Joey Tianyi Zhou Tamara L. Berg ObjD 46 273 0 30 Dec 2016
Learning Visual N-Grams from Web Data Ang Li Allan Jabri Armand Joulin L. V. D. van der Maaten VLM 20 136 0 29 Dec 2016
Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation Gwangbeen Park Woobin Im GAN 16 25 0 26 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task Nan Ding Sebastian Goodman Fei Sha Radu Soricut VLM 27 9 0 22 Dec 2016
Structured Sequence Modeling with Graph Convolutional Recurrent Networks Youngjoo Seo M. Defferrard P. Vandergheynst Xavier Bresson GNN 36 758 0 22 Dec 2016
Re-evaluating Automatic Metrics for Image Captioning Mert Kilickaya Aykut Erdem Nazli Ikizler-Cinbis Erkut Erdem 17 180 0 22 Dec 2016
Top-down Visual Saliency Guided by Captions Vasili Ramanishka Abir Das Jianming Zhang Kate Saenko 21 142 0 21 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 31 132 0 21 Dec 2016
Temporal Tessellation: A Unified Approach for Video Analysis Dotan Kaufman Gil Levi Tal Hassner Lior Wolf 19 16 0 21 Dec 2016
Automatic Generation of Grounded Visual Questions Shijie Zhang Lizhen Qu Shaodi You Zhenglu Yang Jiawan Zhang OOD 27 79 0 20 Dec 2016
Few-Shot Object Recognition from Machine-Labeled Web Images Zhongwen Xu Linchao Zhu Yi Yang VLM 18 66 0 19 Dec 2016
Beyond Holistic Object Recognition: Enriching Image Understanding with Part States Cewu Lu Hao Su Yongyi Lu L. Yi Chi-Keung Tang Leonidas J. Guibas 15 33 0 15 Dec 2016
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering Hao Liu Yang Yang Fumin Shen Lixin Duan Heng Tao Shen 38 9 0 15 Dec 2016
Learning to Hash-tag Videos with Tag2Vec A. Singh Saurabh Saini R. Shah P. J. Narayanan 22 1 0 13 Dec 2016
Text-guided Attention Model for Image Captioning Jonghwan Mun Minsu Cho Bohyung Han VLM 15 92 0 12 Dec 2016