Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.07885
Cited By
Clover: Towards A Unified Video-Language Alignment and Fusion Model
16 July 2022
Jingjia Huang
Yinan Li
Jiashi Feng
Xinglong Wu
Xiaoshuai Sun
Rongrong Ji
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Clover: Towards A Unified Video-Language Alignment and Fusion Model"
4 / 54 papers shown
Title
A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering
Tegan Maharaj
Nicolas Ballas
Anna Rohrbach
Aaron Courville
C. Pal
VGen
49
108
0
23 Nov 2016
Learning Language-Visual Embedding for Movie Understanding with Natural-Language
Atousa Torabi
Niket Tandon
Leonid Sigal
65
97
0
26 Sep 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
215
5,743
0
23 Feb 2016
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
211
2,475
0
01 Apr 2015
Previous
1
2