Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.10362
Cited By
LocVTP: Video-Text Pre-training for Temporal Localization
21 July 2022
Meng Cao
Tianyu Yang
Junwu Weng
Can Zhang
Jue Wang
Yuexian Zou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"LocVTP: Video-Text Pre-training for Temporal Localization"
16 / 16 papers shown
Title
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
94
4
0
24 Feb 2025
Knowing Your Target: Target-Aware Transformer Makes Better Spatio-Temporal Video Grounding
Xin Gu
Yaojie Shen
Chenxi Luo
Tiejian Luo
Yan Huang
Yuewei Lin
Heng Fan
L. Zhang
58
1
0
16 Feb 2025
Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model
Qichen Ye
Junling Liu
Dading Chong
Peilin Zhou
Yining Hua
...
Meng Cao
Ziming Wang
Xuxin Cheng
Andrew Liu
Zhenhua Guo
AI4MH
LM&MA
ELM
25
20
0
13 Oct 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Mohit Bansal
36
129
0
11 May 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
23
220
0
27 Feb 2023
Temporal Perceiving Video-Language Pre-training
Fan Ma
Xiaojie Jin
Heng Wang
Jingjia Huang
Linchao Zhu
Jiashi Feng
Yi Yang
VLM
19
15
0
18 Jan 2023
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David A. Clifton
Jing Chen
VLM
32
63
0
21 Nov 2022
CLIP-Driven Fine-grained Text-Image Person Re-identification
Shuanglin Yan
Neng Dong
Liyan Zhang
Jinhui Tang
23
87
0
19 Oct 2022
METS-CoV: A Dataset of Medical Entity and Targeted Sentiment on COVID-19 Related Tweets
Peilin Zhou
Zeqiang Wang
Dading Chong
Zhijiang Guo
Yining Hua
Zichang Su
Zhiyang Teng
Jiageng Wu
Jie Yang
26
20
0
28 Sep 2022
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
250
558
0
28 Sep 2021
Deep Motion Prior for Weakly-Supervised Temporal Action Localization
Meng Cao
Can Zhang
Long Chen
Mike Zheng Shou
Yuexian Zou
16
21
0
12 Aug 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
156
169
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
309
780
0
18 Apr 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
280
1,981
0
09 Feb 2021
Self-supervised Co-training for Video Representation Learning
Tengda Han
Weidi Xie
Andrew Zisserman
SSL
206
308
0
19 Oct 2020
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
412
595
0
21 Jul 2020
1