Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.07910
Cited By
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
13 May 2023
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval"
9 / 9 papers shown
Title
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Wenhao Wu
Haipeng Luo
Bo Fang
Jingdong Wang
Wanli Ouyang
95
80
0
31 Dec 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
73
222
0
21 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
158
169
0
20 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
318
116
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Probabilistic Embeddings for Cross-Modal Retrieval
Sanghyuk Chun
Seong Joon Oh
Rafael Sampaio de Rezende
Yannis Kalantidis
Diane Larlus
UQCV
401
200
0
13 Jan 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
415
595
0
21 Jul 2020
1