Clover: Towards A Unified Video-Language Alignment and Fusion Model

16 July 2022

Papers citing "Clover: Towards A Unified Video-Language Alignment and Fusion Model"

4 / 54 papers shown

Title
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering Tegan Maharaj Nicolas Ballas Anna Rohrbach Aaron Courville C. Pal VGen 49 108 0 23 Nov 2016
Learning Language-Visual Embedding for Movie Understanding with Natural-Language Atousa Torabi Niket Tandon Leonid Sigal 65 97 0 26 Sep 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 215 5,743 0 23 Feb 2016
Microsoft COCO Captions: Data Collection and Evaluation Server Xinlei Chen Hao Fang Nayeon Lee Ramakrishna Vedantam Saurabh Gupta Piotr Dollar C. L. Zitnick 211 2,475 0 01 Apr 2015