v1v2v3v4v5 (latest)

Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)

20 December 2014

Yi Yang

Papers citing "Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)"

50 / 417 papers shown

Title
Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition Michael Wray Davide Moltisanti W. Mayol-Cuevas Dima Damen 67 2 0 24 Mar 2017
Recurrent Multimodal Interaction for Referring Image Segmentation Chenxi Liu Zhe Lin Xiaohui Shen Jimei Yang Xin Lu Alan Yuille EgoV 94 241 0 23 Mar 2017
Recurrent Topic-Transition GAN for Visual Paragraph Generation Xiaodan Liang Zhiting Hu Huatian Zhang Chuang Gan Eric Xing GAN 92 203 0 21 Mar 2017
Person Search with Natural Language Description Shuang Li Tong Xiao Hongsheng Li Bolei Zhou Dayu Yue Xiaogang Wang 105 397 0 19 Feb 2017
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading Chunlin Tian Weijun Ji 32 7 0 16 Jan 2017
Attention-Based Multimodal Fusion for Video Description Chiori Hori Takaaki Hori Teng-Yok Lee Kazuhiro Sumi J. Hershey Tim K. Marks 95 361 0 11 Jan 2017
Learning Visual N-Grams from Web Data Ang Li Allan Jabri Armand Joulin Laurens van der Maaten VLM 85 138 0 29 Dec 2016
Image-Text Multi-Modal Representation Learning by Adversarial Backpropagation Gwangbeen Park Woobin Im GAN 68 25 0 26 Dec 2016
Understanding Image and Text Simultaneously: a Dual Vision-Language Machine Comprehension Task Nan Ding Sebastian Goodman Fei Sha Radu Soricut VLM 80 9 0 22 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 95 134 0 21 Dec 2016
Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering Hao Liu Yang Yang Fumin Shen Lixin Duan Heng Tao Shen 65 9 0 15 Dec 2016
Text-guided Attention Model for Image Captioning Jonghwan Mun Minsu Cho Bohyung Han VLM 59 93 0 12 Dec 2016
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning Jiasen Lu Caiming Xiong Devi Parikh R. Socher 142 1,458 0 06 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 115 206 0 03 Dec 2016
Guided Open Vocabulary Image Captioning with Constrained Beam Search Peter Anderson Basura Fernando Mark Johnson Stephen Gould 95 237 0 02 Dec 2016
Video Captioning with Multi-Faceted Attention Xiang Long Chuang Gan Gerard de Melo 87 88 0 01 Dec 2016
Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated Images Junhua Mao Jiajing Xu Yushi Jing Alan Yuille 48 48 0 24 Nov 2016
Semantic Compositional Networks for Visual Captioning Zhe Gan Chuang Gan Xiaodong He Yunchen Pu Kenneth Tran Jianfeng Gao Lawrence Carin Li Deng CoGe 112 427 0 23 Nov 2016
Learning Generic Sentence Representations Using Convolutional Neural Networks Zhe Gan Yunchen Pu Ricardo Henao Chunyuan Li Xiaodong He Lawrence Carin SSL 95 98 0 23 Nov 2016
Adaptive Feature Abstraction for Translating Video to Text Yunchen Pu Martin Renqiang Min Zhe Gan Lawrence Carin 72 14 0 23 Nov 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li Li VLM 103 170 0 21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs J. Krause Justin Johnson Ranjay Krishna Li Fei-Fei VLM 106 379 0 20 Nov 2016
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning Long Chen Hanwang Zhang Jun Xiao Liqiang Nie Jian Shao Wei Liu Tat-Seng Chua 84 1,666 0 17 Nov 2016
Semantic Regularisation for Recurrent Image Annotation Feng Liu Tao Xiang Timothy M. Hospedales Wankou Yang Changyin Sun 107 104 0 16 Nov 2016
Dual Attention Networks for Multimodal Reasoning and Matching Hyeonseob Nam Jung-Woo Ha Jeonghee Kim 134 670 0 02 Nov 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges Kushal Kafle Christopher Kanan OOD 112 244 0 05 Oct 2016
A Survey of Multi-View Representation Learning Yingming Li Ming Yang Zhongfei Zhang AI4TS 3DV 346 517 0 03 Oct 2016
Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 138 856 0 21 Sep 2016
GeThR-Net: A Generalized Temporally Hybrid Recurrent Neural Network for Multimodal Information Fusion Ankit Gandhi Arjun Sharma Arijit Biswas Om Deshmukh AI4TS 34 12 0 17 Sep 2016
Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories Mark Harmon Abdolghani Ebrahimi P. Lucey Diego Klabjan GAN 78 20 0 15 Sep 2016
Multimodal Attention for Neural Machine Translation Ozan Caglayan Loïc Barrault Fethi Bougares 84 76 0 13 Sep 2016
Measuring Machine Intelligence Through Visual Question Answering C. L. Zitnick Aishwarya Agrawal Stanislaw Antol Margaret Mitchell Dhruv Batra Devi Parikh 83 37 0 31 Aug 2016
Utilizing Large Scale Vision and Text Datasets for Image Segmentation from Referring Expressions Ronghang Hu Marcus Rohrbach Subhashini Venugopalan Trevor Darrell VLM 65 18 0 30 Aug 2016
phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning Y. Tan Chee Seng Chan VLM 114 29 0 20 Aug 2016
Detecting Sarcasm in Multimodal Social Platforms Rossano Schifanella Paloma de Juan Joel R. Tetreault Liangliang Cao 77 170 0 08 Aug 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 135 1,281 0 31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 162 1,930 0 29 Jul 2016
A Comprehensive Survey on Cross-modal Retrieval Kun Wang Qiyue Yin Wei Wang Shu Wu Liang Wang 88 298 0 21 Jul 2016
Visual Question Answering: A Survey of Methods and Datasets Qi Wu Damien Teney Peng Wang Chunhua Shen A. Dick Anton Van Den Hengel 118 418 0 20 Jul 2016
Captioning Images with Diverse Objects Subhashini Venugopalan Lisa Anne Hendricks Marcus Rohrbach Raymond J. Mooney Trevor Darrell Kate Saenko VLM 91 178 0 24 Jun 2016
Picture It In Your Mind: Generating High Level Visual Representations From Textual Descriptions F. Carrara Andrea Esuli T. Fagni Fabrizio Falchi Alejandro Moreo DiffM 56 30 0 23 Jun 2016
Watch What You Just Said: Image Captioning with Text-Conditional Attention Luowei Zhou Chenliang Xu Parker A. Koch Jason J. Corso VLM 72 44 0 15 Jun 2016
Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation Jie Zhou Ying Cao Xuguang Wang Peng Li Wenyuan Xu AIMat 92 217 0 14 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 353 1,471 0 06 Jun 2016
Storytelling of Photo Stream with Bidirectional Multi-thread Recurrent Neural Network Yu Liu Jianlong Fu Tao Mei C. Chen 51 4 0 02 Jun 2016
Attention Correctness in Neural Image Captioning Chenxi Liu Junhua Mao Fei Sha Alan Yuille 3DV 102 221 0 31 May 2016
SNN: Stacked Neural Networks Milad Mohammadi Subhasis Das 36 15 0 27 May 2016
Generative Adversarial Text to Image Synthesis Scott E. Reed Zeynep Akata Xinchen Yan Lajanugen Logeswaran Bernt Schiele Honglak Lee GAN 211 3,152 0 17 May 2016
Learning Deep Representations of Fine-grained Visual Descriptions Scott E. Reed Zeynep Akata Bernt Schiele Honglak Lee OCL VLM 211 843 0 17 May 2016
Movie Description Anna Rohrbach Atousa Torabi Marcus Rohrbach Niket Tandon C. Pal Hugo Larochelle Aaron Courville Bernt Schiele 3DV VGen 86 361 0 12 May 2016