Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.07571
Cited By
DenseCap: Fully Convolutional Localization Networks for Dense Captioning
24 November 2015
Justin Johnson
A. Karpathy
Li Fei-Fei
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DenseCap: Fully Convolutional Localization Networks for Dense Captioning"
50 / 452 papers shown
Title
Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection
Xiaodan Liang
Lisa Lee
Eric P. Xing
29
250
0
08 Mar 2017
Unsupervised Visual-Linguistic Reference Resolution in Instructional Videos
De-An Huang
Joseph J. Lim
Li Fei-Fei
Juan Carlos Niebles
24
56
0
07 Mar 2017
Visual Translation Embedding Network for Visual Relation Detection
Hanwang Zhang
Zawlin Kyaw
Shih-Fu Chang
Tat-Seng Chua
ViT
154
560
0
27 Feb 2017
ViP-CNN: Visual Phrase Guided Convolutional Neural Network
Yikang Li
Wanli Ouyang
Xiaogang Wang
Xiaoóu Tang
ObjD
22
48
0
23 Feb 2017
Person Search with Natural Language Description
Shuang Li
Tong Xiao
Hongsheng Li
Bolei Zhou
Dayu Yue
Xiaogang Wang
24
386
0
19 Feb 2017
Learning to Detect Human-Object Interactions
Yu-Wei Chao
Yunfan Liu
Michael Xieyang Liu
Huayi Zeng
Jia Deng
28
502
0
17 Feb 2017
Gated Multimodal Units for Information Fusion
John Arevalo
Thamar Solorio
Manuel Montes-y-Gómez
Fabio Gonzalez
33
371
0
07 Feb 2017
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
Xinyu Li
Yanyi Zhang
Jianyu Zhang
Shuhong Chen
I. Marsic
Richard A. Farneth
R. Burd
HAI
15
35
0
06 Feb 2017
Learning Word-Like Units from Joint Audio-Visual Analysis
David Harwath
James R. Glass
24
106
0
25 Jan 2017
Incremental Learning for Robot Perception through HRI
Sepehr Valipour
C. P. Quintero
Martin Jägersand
SSL
CLL
14
32
0
17 Jan 2017
Comprehension-guided referring expressions
Ruotian Luo
Gregory Shakhnarovich
ObjD
29
171
0
12 Jan 2017
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
Licheng Yu
Hao Tan
Joey Tianyi Zhou
Tamara L. Berg
ObjD
46
273
0
30 Dec 2016
Top-down Visual Saliency Guided by Captions
Vasili Ramanishka
Abir Das
Jianming Zhang
Kate Saenko
21
142
0
21 Dec 2016
An Empirical Study of Language CNN for Image Captioning
Jiuxiang Gu
G. Wang
Jianfei Cai
Tsuhan Chen
31
132
0
21 Dec 2016
Automatic Generation of Grounded Visual Questions
Shijie Zhang
Lizhen Qu
Shaodi You
Zhenglu Yang
Jiawan Zhang
OOD
19
79
0
20 Dec 2016
Sparse Factorization Layers for Neural Networks with Limited Supervision
Parker A. Koch
Jason J. Corso
24
2
0
14 Dec 2016
ImageNet pre-trained models with batch normalization
Marcel Simon
E. Rodner
Joachim Denzler
VLM
SSeg
44
165
0
05 Dec 2016
Multi-Label Image Classification with Regional Latent Semantic Dependencies
Junjie Zhang
Qi Wu
Chunhua Shen
Jian Zhang
Jianfeng Lu
25
165
0
04 Dec 2016
Areas of Attention for Image Captioning
M. Pedersoli
Thomas Lucas
Cordelia Schmid
Jakob Verbeek
33
205
0
03 Dec 2016
Training Bit Fully Convolutional Network for Fast Semantic Segmentation
He Wen
Shuchang Zhou
Zhe Liang
Yuxiang Zhang
Dieqiao Feng
Xinyu Zhou
Cong Yao
MQ
SSeg
37
10
0
01 Dec 2016
Modeling Relationships in Referential Expressions with Compositional Modular Networks
Ronghang Hu
Marcus Rohrbach
Jacob Andreas
Trevor Darrell
Kate Saenko
42
401
0
30 Nov 2016
Social Scene Understanding: End-to-End Multi-Person Action Localization and Collective Activity Recognition
Timur M. Bagautdinov
Alexandre Alahi
F. Fleuret
Pascal Fua
Silvio Savarese
19
217
0
28 Nov 2016
DeepSetNet: Predicting Sets with Deep Neural Networks
S. Hamid Rezatofighi
B. V. Kumar
Anton Milan
Ehsan Abbasnejad
A. Dick
Ian Reid
BDL
34
51
0
28 Nov 2016
Grad-CAM: Why did you say that?
Ramprasaath R. Selvaraju
Abhishek Das
Ramakrishna Vedantam
Michael Cogswell
Devi Parikh
Dhruv Batra
FAtt
20
462
0
22 Nov 2016
Sampled Image Tagging and Retrieval Methods on User Generated Content
Karl S. Ni
Kyle Zaragoza
Charles Foster
C. Carrano
Barry Y. Chen
Yonas Tesfaye
A. Gude
22
6
0
21 Nov 2016
Dense Captioning with Joint Inference and Visual Context
L. Yang
K. Tang
Jianchao Yang
Li-Jia Li
VLM
30
169
0
21 Nov 2016
Phrase Localization and Visual Relationship Detection with Comprehensive Image-Language Cues
Bryan A. Plummer
Arun Mallya
Christopher M. Cervantes
J. Hockenmaier
Svetlana Lazebnik
33
189
0
21 Nov 2016
A Hierarchical Approach for Generating Descriptive Image Paragraphs
J. Krause
Justin Johnson
Ranjay Krishna
Li Fei-Fei
VLM
36
373
0
20 Nov 2016
Recurrent Memory Addressing for describing videos
A. Jain
Abhinav Agarwalla
Kumar Krishna Agrawal
Pabitra Mitra
38
10
0
20 Nov 2016
Convolutional Gated Recurrent Networks for Video Segmentation
Mennatullah Siam
Sepehr Valipour
Martin Jägersand
Nilanjan Ray
VOS
22
98
0
16 Nov 2016
Diversity encouraged learning of unsupervised LSTM ensemble for neural activity video prediction
Yilin Song
J. Viventi
Yao Wang
AI4TS
30
2
0
15 Nov 2016
Zero-resource Machine Translation by Multimodal Encoder-decoder Network with Multimedia Pivot
Hideki Nakayama
Noriki Nishida
24
62
0
14 Nov 2016
Memory-augmented Attention Modelling for Videos
Rasool Fakoor
Abdel-rahman Mohamed
Margaret Mitchell
S. B. Kang
Pushmeet Kohli
43
20
0
07 Nov 2016
Spatio-Temporal Attention Models for Grounded Video Captioning
M. Zanfir
Elisabeta Marinoiu
C. Sminchisescu
27
50
0
17 Oct 2016
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
41
19,576
0
07 Oct 2016
Visual Question Answering: Datasets, Algorithms, and Future Challenges
Kushal Kafle
Christopher Kanan
OOD
27
235
0
05 Oct 2016
Learning to generalize to new compositions in image understanding
Y. Atzmon
Jonathan Berant
Vahid Kezami
Amir Globerson
Gal Chechik
26
67
0
27 Aug 2016
Title Generation for User Generated Videos
Kuo-Hao Zeng
Tseng-Hung Chen
Juan Carlos Niebles
Min Sun
35
69
0
25 Aug 2016
Modeling Context Between Objects for Referring Expression Understanding
Varun K. Nagaraja
Vlad I. Morariu
Larry S. Davis
29
143
0
01 Aug 2016
Modeling Context in Referring Expressions
Licheng Yu
Patrick Poirson
Shan Yang
Alexander C. Berg
Tamara L. Berg
28
1,227
0
31 Jul 2016
Watch What You Just Said: Image Captioning with Text-Conditional Attention
Luowei Zhou
Chenliang Xu
Parker A. Koch
Jason J. Corso
VLM
22
44
0
15 Jun 2016
Attend Refine Repeat: Active Box Proposal Generation via In-Out Localization
Spyridon Gidaris
N. Komodakis
ObjD
24
79
0
14 Jun 2016
Deep neural networks are robust to weight binarization and other non-linear distortions
P. Merolla
R. Appuswamy
John V. Arthur
S. K. Esser
D. Modha
OOD
MQ
25
96
0
07 Jun 2016
Recurrent Fully Convolutional Networks for Video Segmentation
Sepehr Valipour
Mennatullah Siam
Martin Jägersand
Nilanjan Ray
VOS
21
89
0
01 Jun 2016
Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition
Théodore Bluche
AI4TS
18
189
0
28 Apr 2016
Attributes as Semantic Units between Natural Language and Visual Recognition
Marcus Rohrbach
VLM
14
3
0
12 Apr 2016
Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning
Andrew Shin
Masataka Yamaguchi
Katsunori Ohnishi
Tatsuya Harada
45
8
0
30 Mar 2016
Rich Image Captioning in the Wild
Kenneth Tran
Xiaodong He
Lei Zhang
Jian Sun
Cornelia Carapcea
Chris Thrasher
Chris Buehler
Chris Sienkiewicz
VLM
19
123
0
30 Mar 2016
BreakingNews: Article Annotation by Image and Text Processing
Arnau Ramisa
F. Yan
Francesc Moreno-Noguer
K. Mikolajczyk
29
105
0
23 Mar 2016
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao
Jonathan Huang
Alexander Toshev
Oana-Maria Camburu
Alan Yuille
Kevin Patrick Murphy
ObjD
33
1,314
0
07 Nov 2015
Previous
1
2
3
...
10
8
9
Next