Movie Description

12 May 2016

Aaron Courville

Bernt Schiele

3DV

VGen

ArXiv PDF HTML

Papers citing "Movie Description"

47 / 47 papers shown

Title
Generative Modeling of Class Probability for Multi-Modal Representation Learning Jungkyoo Shin Bumsoo Kim Eunwoo Kim 88 1 0 21 Mar 2025
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 116 2 0 10 Jan 2025
Do Language Models Understand Time? Xi Ding Lei Wang 242 1 0 18 Dec 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training Gen Luo Xue Yang Wenhan Dou Zhaokai Wang Jifeng Dai Jifeng Dai Yu Qiao Xizhou Zhu VLM MLLM 119 28 0 10 Oct 2024
Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses Eslam Abdelaleem I. Nemenman K. M. Martini 58 6 0 05 Oct 2023
End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering Youngjae Yu Hyungjin Ko Jongwook Choi Gunhee Kim 118 231 0 10 Oct 2016
Learning Language-Visual Embedding for Movie Understanding with Natural-Language Atousa Torabi Niket Tandon Leonid Sigal 65 97 0 26 Sep 2016
Title Generation for User Generated Videos Kuo-Hao Zeng Tseng-Hung Chen Juan Carlos Niebles Min Sun 67 69 0 25 Aug 2016
Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation Rakshith Shetty Jorma T. Laaksonen 61 94 0 17 Aug 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 100 1,914 0 29 Jul 2016
TGIF: A New Dataset and Benchmark on Animated GIF Description Yuncheng Li Yale Song Liangliang Cao Joel R. Tetreault Larry Goldberg A. Jaimes Jiebo Luo 79 271 0 10 Apr 2016
Improving LSTM-based Video Description with Linguistic Knowledge Mined from Text Subhashini Venugopalan Lisa Anne Hendricks Raymond J. Mooney Kate Saenko VLM 42 117 0 06 Apr 2016
RNN Fisher Vectors for Action Recognition and Image Annotation Guy Lev Gil Sadeh Benjamin Klein Lior Wolf 51 164 0 12 Dec 2015
Video captioning with recurrent networks based on frame- and video-level features and visual content classification Rakshith Shetty Jorma T. Laaksonen 42 31 0 09 Dec 2015
MovieQA: Understanding Stories in Movies through Question-Answering Makarand Tapaswi Yukun Zhu Rainer Stiefelhagen Antonio Torralba R. Urtasun Sanja Fidler 112 746 0 09 Dec 2015
Delving Deeper into Convolutional Networks for Learning Video Representations Nicolas Ballas L. Yao C. Pal Aaron Courville MDE 85 701 0 19 Nov 2015
Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data Lisa Anne Hendricks Subhashini Venugopalan Marcus Rohrbach Raymond J. Mooney Kate Saenko Trevor Darrell CoGe 61 284 0 17 Nov 2015
Uncovering Temporal Context for Video Question and Answering Linchao Zhu Zhongwen Xu Yi Yang Alexander G. Hauptmann BDL 65 45 0 15 Nov 2015
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning Pingbo Pan Zhongwen Xu Yi Yang Leilei Gan Yueting Zhuang 43 385 0 11 Nov 2015
Video Paragraph Captioning Using Hierarchical Recurrent Neural Networks Haonan Yu Jiang Wang Zhiheng Huang Yi Yang Wenyuan Xu 88 560 0 26 Oct 2015
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books Yukun Zhu Ryan Kiros R. Zemel Ruslan Salakhutdinov R. Urtasun Antonio Torralba Sanja Fidler 120 2,548 0 22 Jun 2015
The Long-Short Story of Movie Description Anna Rohrbach Marcus Rohrbach Bernt Schiele VLM 64 111 0 04 Jun 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui 77 534 0 07 May 2015
Language Models for Image Captioning: The Quirks and What Works Jacob Devlin Hao Cheng Hao Fang Saurabh Gupta Li Deng Xiaodong He Geoffrey Zweig Margaret Mitchell 83 281 0 07 May 2015
Sequence to Sequence -- Video to Text Subhashini Venugopalan Marcus Rohrbach Jeff Donahue Raymond J. Mooney Trevor Darrell Kate Saenko 140 1,418 0 03 May 2015
Microsoft COCO Captions: Data Collection and Evaluation Server Xinlei Chen Hao Fang Nayeon Lee Ramakrishna Vedantam Saurabh Gupta Piotr Dollar C. L. Zitnick 211 2,475 0 01 Apr 2015
Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research Atousa Torabi C. Pal Hugo Larochelle Aaron Courville VGen 83 205 0 03 Mar 2015
Describing Videos by Exploiting Temporal Structure L. Yao Atousa Torabi Kyunghyun Cho Nicolas Ballas C. Pal Hugo Larochelle Aaron Courville 141 1,064 0 27 Feb 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 334 10,069 0 10 Feb 2015
A Dataset for Movie Description Anna Rohrbach Marcus Rohrbach Niket Tandon Bernt Schiele VGen 108 500 0 12 Jan 2015
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 168 1,240 0 20 Dec 2014
Translating Videos to Natural Language Using Deep Recurrent Neural Networks Subhashini Venugopalan Huijuan Xu Jeff Donahue Marcus Rohrbach Raymond J. Mooney Kate Saenko 135 952 0 15 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 124 5,584 0 07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 283 4,484 0 20 Nov 2014
From Captions to Visual Concepts and Back Hao Fang Saurabh Gupta F. Iandola R. Srivastava Li Deng ... Xiaodong He Margaret Mitchell John C. Platt C. L. Zitnick Geoffrey Zweig VLM 103 1,311 0 18 Nov 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 237 6,028 0 17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 162 6,051 0 17 Nov 2014
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models Ryan Kiros Ruslan Salakhutdinov R. Zemel VLM 125 1,399 0 10 Nov 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 457 43,649 0 17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.6K 100,348 0 04 Sep 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.7K 39,525 0 01 Sep 2014
Video In Sentences Out Andrei Barbu Alexander Bridge Zachary Burchill D. Coroian Sven J. Dickinson ... Jarrell W. Waggoner Song Wang Jinlian Wei Yifan Yin Zhiqi Zhang 64 156 0 09 Aug 2014
LSDA: Large Scale Detection Through Adaptation Judy Hoffman S. Guadarrama Eric Tzeng Ronghang Hu Jeff Donahue Ross B. Girshick Trevor Darrell Kate Saenko ObjD 95 334 0 18 Jul 2014
Weakly Supervised Action Labeling in Videos Under Ordering Constraints Piotr Bojanowski Rémi Lajugie Francis R. Bach Ivan Laptev Jean Ponce Cordelia Schmid Josef Sivic 63 237 0 04 Jul 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 413 43,638 0 01 May 2014
Coherent Multi-Sentence Video Description with Variable Level of Detail Anna Rohrbach Marcus Rohrbach Weijian Qiu Annemarie Friedrich Sikandar Amin Mykhaylo Andriluka Manfred Pinkal Bernt Schiele 73 218 0 24 Mar 2014
Improving neural networks by preventing co-adaptation of feature detectors Geoffrey E. Hinton Nitish Srivastava A. Krizhevsky Ilya Sutskever Ruslan Salakhutdinov VLM 450 7,661 0 03 Jul 2012