Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1412.4729
Cited By
v1
v2
v3 (latest)
Translating Videos to Natural Language Using Deep Recurrent Neural Networks
15 December 2014
Subhashini Venugopalan
Huijuan Xu
Jeff Donahue
Marcus Rohrbach
Raymond J. Mooney
Kate Saenko
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Translating Videos to Natural Language Using Deep Recurrent Neural Networks"
26 / 26 papers shown
Title
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
94
22
0
05 Oct 2023
Global2Local: A Joint-Hierarchical Attention for Video Captioning
Chengpeng Dai
Fuhai Chen
Xiaoshuai Sun
Rongrong Ji
QiXiang Ye
Yongjian Wu
71
1
0
13 Mar 2022
Dense Video Captioning Using Unsupervised Semantic Information
Valter Estevam
Rayson Laroca
Hélio Pedrini
David Menotti
84
10
0
15 Dec 2021
Vision Meets Wireless Positioning: Effective Person Re-identification with Recurrent Context Propagation
Yiheng Liu
Wen-gang Zhou
Mao Xi
Sanjing Shen
Houqiang Li
77
8
0
10 Aug 2020
Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks
Matthias Plappert
Christian Mandery
Tamim Asfour
3DH
143
132
0
18 May 2017
A Survey on Content-Aware Video Analysis for Sports
H. Shih
67
193
0
03 Mar 2017
Attention-Based Multimodal Fusion for Video Description
Chiori Hori
Takaaki Hori
Teng-Yok Lee
Kazuhiro Sumi
J. Hershey
Tim K. Marks
80
360
0
11 Jan 2017
Spatio-Temporal Attention Models for Grounded Video Captioning
M. Zanfir
Elisabeta Marinoiu
C. Sminchisescu
102
50
0
17 Oct 2016
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Youngjae Yu
Hyungjin Ko
Jongwook Choi
Gunhee Kim
137
231
0
10 Oct 2016
A Survey of Multi-View Representation Learning
Yingming Li
Ming Yang
Zhongfei Zhang
AI4TS
3DV
324
515
0
03 Oct 2016
From Captions to Visual Concepts and Back
Hao Fang
Saurabh Gupta
F. Iandola
R. Srivastava
Li Deng
...
Xiaodong He
Margaret Mitchell
John C. Platt
C. L. Zitnick
Geoffrey Zweig
VLM
131
1,312
0
18 Nov 2014
Show and Tell: A Neural Image Caption Generator
Oriol Vinyals
Alexander Toshev
Samy Bengio
D. Erhan
3DV
265
6,042
0
17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
Jeff Donahue
Lisa Anne Hendricks
Marcus Rohrbach
Subhashini Venugopalan
S. Guadarrama
Kate Saenko
Trevor Darrell
VLM
173
6,057
0
17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models
Ryan Kiros
Ruslan Salakhutdinov
R. Zemel
VLM
135
1,401
0
10 Nov 2014
Learning to Execute
Wojciech Zaremba
Ilya Sutskever
ODL
99
560
0
17 Oct 2014
Explain Images with Multimodal Recurrent Neural Networks
Junhua Mao
Wenyuan Xu
Yi Yang
Jiang Wang
Alan Yuille
VLM
GAN
115
385
0
04 Oct 2014
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
450
20,606
0
10 Sep 2014
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches
Kyunghyun Cho
B. V. Merrienboer
Dzmitry Bahdanau
Yoshua Bengio
AI4CE
AIMat
270
6,791
0
03 Sep 2014
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.7K
39,637
0
01 Sep 2014
Video In Sentences Out
Andrei Barbu
Alexander Bridge
Zachary Burchill
D. Coroian
Sven J. Dickinson
...
Jarrell W. Waggoner
Song Wang
Jinlian Wei
Yifan Yin
Zhiqi Zhang
69
156
0
09 Aug 2014
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
A. Karpathy
Armand Joulin
Li Fei-Fei
VLM
116
937
0
22 Jun 2014
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia
Evan Shelhamer
Jeff Donahue
Sergey Karayev
Jonathan Long
Ross B. Girshick
S. Guadarrama
Trevor Darrell
VLM
BDL
3DV
296
14,715
0
20 Jun 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
437
43,875
0
01 May 2014
Coherent Multi-Sentence Video Description with Variable Level of Detail
Anna Rohrbach
Marcus Rohrbach
Weijian Qiu
Annemarie Friedrich
Sikandar Amin
Mykhaylo Andriluka
Manfred Pinkal
Bernt Schiele
91
218
0
24 Mar 2014
Visualizing and Understanding Convolutional Networks
Matthew D. Zeiler
Rob Fergus
FAtt
SSL
605
15,907
0
12 Nov 2013
DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition
Jeff Donahue
Yangqing Jia
Oriol Vinyals
Judy Hoffman
Ning Zhang
Eric Tzeng
Trevor Darrell
VLM
ObjD
193
4,954
0
06 Oct 2013
1