Describing Videos by Exploiting Temporal Structure

27 February 2015

Aaron Courville

Papers citing "Describing Videos by Exploiting Temporal Structure"

22 / 372 papers shown

Title
Learning Articulated Motion Models from Visual and Lingual Signals Zhengyang Wu Joey Tianyi Zhou Matthew R. Walter 27 0 0 17 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Huijuan Xu Kate Saenko 27 760 0 17 Nov 2015
Uncovering Temporal Context for Video Question and Answering Linchao Zhu Zhongwen Xu Yi Yang Alexander G. Hauptmann BDL 27 44 0 15 Nov 2015
Oracle performance for visual captioning L. Yao Nicolas Ballas Kyunghyun Cho John R. Smith Yoshua Bengio VLM 39 8 0 14 Nov 2015
Action Recognition using Visual Attention Shikhar Sharma Ryan Kiros Ruslan Salakhutdinov 24 666 0 12 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction Anna Rohrbach Marcus Rohrbach Ronghang Hu Trevor Darrell Bernt Schiele 27 494 0 12 Nov 2015
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning Pingbo Pan Zhongwen Xu Yi Yang Fei Wu Yueting Zhuang 18 385 0 11 Nov 2015
Attention to Scale: Scale-aware Semantic Image Segmentation Liang-Chieh Chen Yi Yang Jiang Wang Wei Xu Alan Yuille SSeg 54 1,316 0 10 Nov 2015
Detecting events and key actors in multi-person videos Vignesh Ramanathan Jonathan Huang Sami Abu-El-Haija Alexander N. Gorban Kevin Patrick Murphy Li Fei-Fei 24 208 0 09 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks Haonan Yu Jiang Wang Zhiheng Huang Yi Yang Wenyuan Xu 44 560 0 26 Oct 2015
Recursive Training of 2D-3D Convolutional Networks for Neuronal Boundary Detection Kisuk Lee A. Zlateski Ashwin Vishwanathan H. S. Seung 3DV 11 59 0 20 Aug 2015
Towards Storytelling from Visual Lifelogging: An Overview Marc Bolaños Mariella Dimiccoli Petia Radeva EgoV 26 135 0 22 Jul 2015
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos Serena Yeung Olga Russakovsky Ning Jin Mykhaylo Andriluka Greg Mori Li Fei-Fei VLM 42 436 0 21 Jul 2015
Describing Multimedia Content using Attention-based Encoder--Decoder Networks Kyunghyun Cho Aaron Courville Yoshua Bengio 32 411 0 04 Jul 2015
The Long-Short Story of Movie Description Anna Rohrbach Marcus Rohrbach Bernt Schiele VLM 30 110 0 04 Jun 2015
What value do explicit high level concepts have in vision to language problems? Qi Wu Chunhua Shen Lingqiao Liu A. Dick Anton Van Den Hengel 27 443 0 03 Jun 2015
A Multi-scale Multiple Instance Video Description Network Huijuan Xu Subhashini Venugopalan Vasili Ramanishka Marcus Rohrbach Kate Saenko 40 64 0 21 May 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui 41 535 0 07 May 2015
Sequence to Sequence -- Video to Text Subhashini Venugopalan Marcus Rohrbach Jeff Donahue Raymond J. Mooney Trevor Darrell Kate Saenko 27 1,416 0 03 May 2015
ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks Francesco Visin Kyle Kastner Kyunghyun Cho Matteo Matteucci Aaron Courville Yoshua Bengio SSeg 23 271 0 03 May 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 115 10,006 0 10 Feb 2015
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 94 6,032 0 17 Nov 2014