DenseCap: Fully Convolutional Localization Networks for Dense Captioning

24 November 2015

Li Fei-Fei

Papers citing "DenseCap: Fully Convolutional Localization Networks for Dense Captioning"

50 / 452 papers shown

Title
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection Xiaodan Liang Lisa Lee Eric P. Xing 29 250 0 08 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos De-An Huang Joseph J. Lim Li Fei-Fei Juan Carlos Niebles 24 56 0 07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang Zawlin Kyaw Shih-Fu Chang Tat-Seng Chua ViT 154 560 0 27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network Yikang Li Wanli Ouyang Xiaogang Wang Xiaoóu Tang ObjD 22 48 0 23 Feb 2017
Person Search with Natural Language Description Shuang Li Tong Xiao Hongsheng Li Bolei Zhou Dayu Yue Xiaogang Wang 24 386 0 19 Feb 2017
Learning to Detect Human-Object Interactions Yu-Wei Chao Yunfan Liu Michael Xieyang Liu Huayi Zeng Jia Deng 28 502 0 17 Feb 2017
Gated Multimodal Units for Information Fusion John Arevalo Thamar Solorio Manuel Montes-y-Gómez Fabio Gonzalez 33 371 0 07 Feb 2017
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure Xinyu Li Yanyi Zhang Jianyu Zhang Shuhong Chen I. Marsic Richard A. Farneth R. Burd HAI 15 35 0 06 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis David Harwath James R. Glass 24 106 0 25 Jan 2017
Incremental Learning for Robot Perception through HRI Sepehr Valipour C. P. Quintero Martin Jägersand SSL CLL 14 32 0 17 Jan 2017
Comprehension-guided referring expressions Ruotian Luo Gregory Shakhnarovich ObjD 29 171 0 12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions Licheng Yu Hao Tan Joey Tianyi Zhou Tamara L. Berg ObjD 46 273 0 30 Dec 2016
Top-down Visual Saliency Guided by Captions Vasili Ramanishka Abir Das Jianming Zhang Kate Saenko 21 142 0 21 Dec 2016
An Empirical Study of Language CNN for Image Captioning Jiuxiang Gu G. Wang Jianfei Cai Tsuhan Chen 31 132 0 21 Dec 2016
Automatic Generation of Grounded Visual Questions Shijie Zhang Lizhen Qu Shaodi You Zhenglu Yang Jiawan Zhang OOD 19 79 0 20 Dec 2016
Sparse Factorization Layers for Neural Networks with Limited Supervision Parker A. Koch Jason J. Corso 24 2 0 14 Dec 2016
ImageNet pre-trained models with batch normalization Marcel Simon E. Rodner Joachim Denzler VLM SSeg 44 165 0 05 Dec 2016
Multi-Label Image Classification with Regional Latent Semantic Dependencies Junjie Zhang Qi Wu Chunhua Shen Jian Zhang Jianfeng Lu 25 165 0 04 Dec 2016
Areas of Attention for Image Captioning M. Pedersoli Thomas Lucas Cordelia Schmid Jakob Verbeek 33 205 0 03 Dec 2016
Training Bit Fully Convolutional Network for Fast Semantic Segmentation He Wen Shuchang Zhou Zhe Liang Yuxiang Zhang Dieqiao Feng Xinyu Zhou Cong Yao MQ SSeg 37 10 0 01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks Ronghang Hu Marcus Rohrbach Jacob Andreas Trevor Darrell Kate Saenko 42 401 0 30 Nov 2016
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition Timur M. Bagautdinov Alexandre Alahi F. Fleuret Pascal Fua Silvio Savarese 19 217 0 28 Nov 2016
DeepSetNet: Predicting Sets with Deep Neural Networks S. Hamid Rezatofighi B. V. Kumar Anton Milan Ehsan Abbasnejad A. Dick Ian Reid BDL 34 51 0 28 Nov 2016
Grad-CAM: Why did you say that? Ramprasaath R. Selvaraju Abhishek Das Ramakrishna Vedantam Michael Cogswell Devi Parikh Dhruv Batra FAtt 20 462 0 22 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content Karl S. Ni Kyle Zaragoza Charles Foster C. Carrano Barry Y. Chen Yonas Tesfaye A. Gude 22 6 0 21 Nov 2016
Dense Captioning with Joint Inference and Visual Context L. Yang K. Tang Jianchao Yang Li-Jia Li VLM 30 169 0 21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues Bryan A. Plummer Arun Mallya Christopher M. Cervantes J. Hockenmaier Svetlana Lazebnik 33 189 0 21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs J. Krause Justin Johnson Ranjay Krishna Li Fei-Fei VLM 36 373 0 20 Nov 2016
Recurrent Memory Addressing for describing videos A. Jain Abhinav Agarwalla Kumar Krishna Agrawal Pabitra Mitra 38 10 0 20 Nov 2016
Convolutional Gated Recurrent Networks for Video Segmentation Mennatullah Siam Sepehr Valipour Martin Jägersand Nilanjan Ray VOS 22 98 0 16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction Yilin Song J. Viventi Yao Wang AI4TS 30 2 0 15 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot Hideki Nakayama Noriki Nishida 24 62 0 14 Nov 2016
Memory-augmented Attention Modelling for Videos Rasool Fakoor Abdel-rahman Mohamed Margaret Mitchell S. B. Kang Pushmeet Kohli 43 20 0 07 Nov 2016
Spatio-Temporal Attention Models for Grounded Video Captioning M. Zanfir Elisabeta Marinoiu C. Sminchisescu 27 50 0 17 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization Ramprasaath R. Selvaraju Michael Cogswell Abhishek Das Ramakrishna Vedantam Devi Parikh Dhruv Batra FAtt 41 19,576 0 07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges Kushal Kafle Christopher Kanan OOD 27 235 0 05 Oct 2016
Learning to generalize to new compositions in image understanding Y. Atzmon Jonathan Berant Vahid Kezami Amir Globerson Gal Chechik 26 67 0 27 Aug 2016
Title Generation for User Generated Videos Kuo-Hao Zeng Tseng-Hung Chen Juan Carlos Niebles Min Sun 35 69 0 25 Aug 2016
Modeling Context Between Objects for Referring Expression Understanding Varun K. Nagaraja Vlad I. Morariu Larry S. Davis 29 143 0 01 Aug 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 28 1,227 0 31 Jul 2016
Watch What You Just Said: Image Captioning with Text-Conditional Attention Luowei Zhou Chenliang Xu Parker A. Koch Jason J. Corso VLM 22 44 0 15 Jun 2016
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization Spyridon Gidaris N. Komodakis ObjD 24 79 0 14 Jun 2016
Deep neural networks are robust to weight binarization and other non-linear distortions P. Merolla R. Appuswamy John V. Arthur S. K. Esser D. Modha OOD MQ 25 96 0 07 Jun 2016
Recurrent Fully Convolutional Networks for Video Segmentation Sepehr Valipour Mennatullah Siam Martin Jägersand Nilanjan Ray VOS 21 89 0 01 Jun 2016
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition Théodore Bluche AI4TS 18 189 0 28 Apr 2016
Attributes as Semantic Units between Natural Language and Visual Recognition Marcus Rohrbach VLM 14 3 0 12 Apr 2016
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning Andrew Shin Masataka Yamaguchi Katsunori Ohnishi Tatsuya Harada 45 8 0 30 Mar 2016
Rich Image Captioning in the Wild Kenneth Tran Xiaodong He Lei Zhang Jian Sun Cornelia Carapcea Chris Thrasher Chris Buehler Chris Sienkiewicz VLM 19 123 0 30 Mar 2016
BreakingNews: Article Annotation by Image and Text Processing Arnau Ramisa F. Yan Francesc Moreno-Noguer K. Mikolajczyk 29 105 0 23 Mar 2016
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 33 1,314 0 07 Nov 2015