Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.06942
Cited By
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
13 July 2023
Yi Wang
Yinan He
Yizhuo Li
Kunchang Li
Jiashuo Yu
X. Ma
Xinhao Li
Guo Chen
Xinyuan Chen
Yaohui Wang
Conghui He
Ping Luo
Ziwei Liu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation"
7 / 57 papers shown
Title
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
242
407
0
13 Jul 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
386
1,103
0
17 Feb 2021
ActBERT: Learning Global-Local Video-Text Representations
Linchao Zhu
Yi Yang
ViT
101
419
0
14 Nov 2020
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
Antoine Miech
Dimitri Zhukov
Jean-Baptiste Alayrac
Makarand Tapaswi
Ivan Laptev
Josef Sivic
VGen
91
1,192
0
07 Jun 2019
How2: A Large-scale Dataset for Multimodal Language Understanding
Ramon Sanabria
Ozan Caglayan
Shruti Palaskar
Desmond Elliott
Loïc Barrault
Lucia Specia
Florian Metze
VGen
MLLM
61
287
0
01 Nov 2018
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
91
940
0
04 Aug 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
199
7,961
0
22 May 2017
Previous
1
2