A Dataset for Movie Description

12 January 2015

Bernt Schiele

Papers citing "A Dataset for Movie Description"

50 / 257 papers shown

Title
MDMMT: Multidomain Multimodal Transformer for Video Retrieval Maksim Dzabraev M. Kalashnikov Stepan Alekseevich Komkov Aleksandr Petiushko 24 128 0 19 Mar 2021
On Semantic Similarity in Video Retrieval Michael Wray Hazel Doughty Dima Damen 33 66 0 18 Mar 2021
A Straightforward Framework For Video Retrieval Using CLIP Jesús Andrés Portillo-Quintero J. C. Ortíz-Bayliss Hugo Terashima-Marín CLIP 324 117 0 24 Feb 2021
Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling Jie Lei Linjie Li Luowei Zhou Zhe Gan Tamara L. Berg Joey Tianyi Zhou Jingjing Liu CLIP 46 648 0 11 Feb 2021
The Role of the Input in Natural Language Video Description S. Cascianelli G. Costante Alessandro Devo Thomas Alessandro Ciarfuglia P. Valigi M. L. Fravolini 21 5 0 09 Feb 2021
Narration Generation for Cartoon Videos Nikos Papasarantopoulos Shay B. Cohen VGen 25 2 0 17 Jan 2021
Recent Advances in Video Question Answering: A Review of Datasets and Methods Devshree Patel Ratnam Parikh Yesha Shastri 15 18 0 15 Jan 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 38 14 0 07 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue Hung Le Chinnadhurai Sankar Seungwhan Moon Ahmad Beirami A. Geramifard Satwik Kottur VGen 39 18 0 01 Jan 2021
Movie Summarization via Sparse Graph Construction Pinelopi Papalampidi Frank Keller Mirella Lapata 27 32 0 14 Dec 2020
MSVD-Turkish: A Comprehensive Multimodal Dataset for Integrated Vision and Language Research in Turkish Begum Citamak Ozan Caglayan Menekse Kuyu Erkut Erdem Aykut Erdem Pranava Madhyastha Lucia Specia 31 8 0 13 Dec 2020
A Comprehensive Review on Recent Methods and Challenges of Video Description Ashutosh Kumar Singh Thoudam Doren Singh Sivaji Bandyopadhyay 3DV VLM 19 5 0 30 Nov 2020
QuerYD: A video dataset with high-quality text and audio narrations Andreea-Maria Oncescu João F. Henriques Yang Liu Andrew Zisserman Samuel Albanie VGen 22 11 0 22 Nov 2020
Video Action Understanding Matthew Hutchinson V. Gadepally 43 20 0 13 Oct 2020
Dual Encoding for Video Retrieval by Text Jianfeng Dong Xirong Li Chaoxi Xu Xun Yang Gang Yang Xun Wang Meng Wang 24 2 0 10 Sep 2020
Identity-Aware Multi-Sentence Video Description J. S. Park Trevor Darrell Anna Rohrbach 26 17 0 22 Aug 2020
Text-based Localization of Moments in a Video Corpus Sudipta Paul Niluthpol Chowdhury Mithun Amit K. Roy-Chowdhury 10 14 0 20 Aug 2020
Poet: Product-oriented Video Captioner for E-commerce Shengyu Zhang Ziqi Tan Jin Yu Zhou Zhao Kun Kuang Jie Liu Jingren Zhou Hongxia Yang Fei Wu 14 34 0 16 Aug 2020
Enriching Video Captions With Contextual Text Philipp Rimle Pelin Dogan Markus Gross 30 3 0 29 Jul 2020
Active Learning for Video Description With Cluster-Regularized Ensemble Ranking David M. Chan Sudheendra Vijayanarasimhan David A. Ross John F. Canny VLM 14 6 0 27 Jul 2020
MovieNet: A Holistic Dataset for Movie Understanding Qingqiu Huang Yu Xiong Anyi Rao Jiaze Wang Dahua Lin VGen 45 235 0 21 Jul 2020
Multi-modal Transformer for Video Retrieval Valentin Gabeur Chen Sun Alahari Karteek Cordelia Schmid ViT 430 596 0 21 Jul 2020
Knowledge-Based Video Question Answering with Unsupervised Scene Descriptions Noa Garcia Yuta Nakashima 26 32 0 17 Jul 2020
Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval Xun Yang Jianfeng Dong Yixin Cao Xun Wang Meng Wang Tat-Seng Chua 33 137 0 06 Jul 2020
Auto-captions on GIF: A Large-scale Video-sentence Dataset for Vision-language Pre-training Yingwei Pan Yehao Li Jianjie Luo Jun Xu Ting Yao Tao Mei 38 57 0 05 Jul 2020
Comprehensive Information Integration Modeling Framework for Video Titling Shengyu Zhang Ziqi Tan Jin Yu Zhou Zhao Kun Kuang Tan Jiang Jingren Zhou Hongxia Yang Fei Wu 31 40 0 24 Jun 2020
Rescaling Egocentric Vision Dima Damen Hazel Doughty G. Farinella Antonino Furnari Evangelos Kazakos ... Davide Moltisanti Jonathan Munro Toby Perrett Will Price Michael Wray EgoV 19 437 0 23 Jun 2020
The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines Dima Damen Hazel Doughty G. Farinella Sanja Fidler Antonino Furnari ... Davide Moltisanti Jonathan Munro Toby Perrett Will Price Michael Wray EgoV 23 225 0 29 Apr 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning Elad Amrani Rami Ben-Ari Daniel Rotman A. Bronstein 17 121 0 06 Mar 2020
ManyModalQA: Modality Disambiguation and QA over Diverse Inputs Darryl Hannan Akshay Jain Joey Tianyi Zhou AAML 38 57 0 22 Jan 2020
On the Evaluation of Intelligent Process Automation Deborah Ferreira Julia Rozanova K. Dubba Dell Zhang André Freitas 9 9 0 08 Jan 2020
End-to-End Learning of Visual Representations from Uncurated Instructional Videos Antoine Miech Jean-Baptiste Alayrac Lucas Smaira Ivan Laptev Josef Sivic Andrew Zisserman VGen SSL 42 703 0 13 Dec 2019
Assessing the Robustness of Visual Question Answering Models Jia-Hong Huang Modar Alfadly Guohao Li M. Worring AAML OOD 23 23 0 30 Nov 2019
A Graph-Based Framework to Bridge Movies and Synopses Yu Xiong Chengyi Zhang Lingfeng Guo Hang Zhou Bolei Zhou Dahua Lin 27 62 0 24 Oct 2019
Embodied Language Grounding with 3D Visual Feature Representations Mihir Prabhudesai H. Tung Syed Ashar Javed Maximilian Sieb Adam W. Harley Katerina Fragkiadaki 25 21 0 02 Oct 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention Cristian Rodriguez-Opazo Edison Marrese-Taylor F. Saleh Hongdong Li Stephen Gould 30 147 0 20 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Yang Liu Samuel Albanie Arsha Nagrani Andrew Zisserman 36 387 0 31 Jul 2019
Finding Moments in Video Collections Using Natural Language Victor Escorcia Mattia Soldan Josef Sivic Guohao Li Bryan C. Russell 31 6 0 30 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Aditya Mogadala M. Kalimuthu Dietrich Klakow VLM 25 133 0 22 Jul 2019
Cross-Lingual Transfer Learning for Question Answering Chia-Hsuan Lee Hung-yi Lee 28 23 0 13 Jul 2019
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips Antoine Miech Dimitri Zhukov Jean-Baptiste Alayrac Makarand Tapaswi Ivan Laptev Josef Sivic VGen 27 1,175 0 07 Jun 2019
Synthetic Defocus and Look-Ahead Autofocus for Casual Videography X. Zhang Kevin Blackburn-Matzen Vivien Nguyen Dillon Yao You Zhang Ren Ng VGen 26 37 0 15 May 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang William Yang Wang 32 539 0 06 Apr 2019
M-VAD Names: a Dataset for Video Captioning with Naming S. Pini Marcella Cornia Federico Bolelli Lorenzo Baraldi Rita Cucchiara 27 29 0 04 Mar 2019
Hierarchical LSTMs with Adaptive Attention for Visual Captioning Jingkuan Song Xiangpeng Li Lianli Gao Heng Tao Shen 23 221 0 26 Dec 2018
MTLE: A Multitask Learning Encoder of Visual Feature Representations for Video and Movie Description Oliver A. Nina Washington Garcia Scott Clouse Alper Yilmaz 23 4 0 19 Sep 2018
Dual Encoding for Zero-Example Video Retrieval Jianfeng Dong Xirong Li Chaoxi Xu S. Ji Yuan He Gang Yang Xun Wang 30 268 0 17 Sep 2018
LiveBot: Generating Live Video Comments Based on Visual and Textual Contexts Shuming Ma Lei Cui Damai Dai Furu Wei Xu Sun VGen 28 61 0 13 Sep 2018
TVQA: Localized, Compositional Video Question Answering Muhammad Abdul Wahab Licheng Yu Mounir Nasr Allah Tamara L. Berg 36 617 0 05 Sep 2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval Youngjae Yu Jongseok Kim Gunhee Kim 40 340 0 07 Aug 2018