Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.01852
Cited By
v1
v2
v3
v4
v5
v6
v7 (latest)
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
3 October 2023
Bin Zhu
Bin Lin
Munan Ning
Yang Yan
Jiaxi Cui
HongFa Wang
Yatian Pang
Wenhao Jiang
Junwu Zhang
Zongwei Li
Wancai Zhang
Zhifeng Li
Wei Liu
Liejie Yuan
VLM
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (810★)
Papers citing
"LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment"
5 / 55 papers shown
Title
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
115
946
0
04 Aug 2017
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
Chunhui Gu
Chen Sun
David A. Ross
Carl Vondrick
C. Pantofaru
...
G. Toderici
Susanna Ricco
Rahul Sukthankar
Cordelia Schmid
Jitendra Malik
VGen
107
1,030
0
23 May 2017
YouTube-8M: A Large-Scale Video Classification Benchmark
Sami Abu-El-Haija
Nisarg Kothari
Joonseok Lee
Apostol Natsev
G. Toderici
Balakrishnan Varadarajan
Sudheendra Vijayanarasimhan
VLM
151
1,270
0
27 Sep 2016
Two-Stream Convolutional Networks for Action Recognition in Videos
Karen Simonyan
Andrew Zisserman
247
7,535
0
09 Jun 2014
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
K. Soomro
Amir Zamir
M. Shah
CLIP
VGen
155
6,162
0
03 Dec 2012
Previous
1
2