Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.01300
Cited By
Masking Modalities for Cross-modal Video Retrieval
1 November 2021
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masking Modalities for Cross-modal Video Retrieval"
23 / 23 papers shown
Title
Leveraging Modality Tags for Enhanced Cross-Modal Video Retrieval
A. Fragomeni
Dima Damen
Michael Wray
33
0
0
02 Apr 2025
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
31
4
0
07 Dec 2023
Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction
Chenxin Xu
R. Tan
Yuhong Tan
Siheng Chen
Xinchao Wang
Yanfeng Wang
3DH
45
20
0
17 Aug 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
48
29
0
24 Jul 2023
Modality Influence in Multimodal Machine Learning
Abdelhamid Haouhat
Slimane Bellaouar
A. Nehar
H. Cherroun
23
2
0
10 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Jiaheng Liu
32
97
0
29 May 2023
Fusion for Visual-Infrared Person ReID in Real-World Surveillance Using Corrupted Multimodal Data
Arthur Josi
Mahdi Alehdaghi
Rafael M. O. Cruz
Eric Granger
18
2
0
29 Apr 2023
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Sihan Chen
Xingjian He
Longteng Guo
Xinxin Zhu
Weining Wang
Jinhui Tang
Jinhui Tang
VLM
31
102
0
17 Apr 2023
Exposing and Mitigating Spurious Correlations for Cross-Modal Retrieval
Jae Myung Kim
A. Sophia Koepke
Cordelia Schmid
Zeynep Akata
78
25
0
06 Apr 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
Multimodal Data Augmentation for Visual-Infrared Person ReID with Corrupted Data
Arthur Josi
Mahdi Alehdaghi
Rafael M. O. Cruz
Eric Granger
23
13
0
22 Nov 2022
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
Peng Jin
Jinfa Huang
Fenglin Liu
Xian Wu
Shen Ge
Guoli Song
David A. Clifton
Jing Chen
VLM
42
63
0
21 Nov 2022
Cross-Lingual Cross-Modal Retrieval with Noise-Robust Learning
Yabing Wang
Jianfeng Dong
Tianxiang Liang
Minsong Zhang
Rui Cai
Xun Wang
29
20
0
26 Aug 2022
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
Burak Satar
Hongyuan Zhu
Hanwang Zhang
J. Lim
26
11
0
26 Jun 2022
Symmetric Network with Spatial Relationship Modeling for Natural Language-based Vehicle Retrieval
Chuyang Zhao
Haobo Chen
Wenyuan Zhang
Junru Chen
Sipeng Zhang
Yadong Li
Boxun Li
21
10
0
22 Jun 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
54
527
0
13 Jun 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
126
62
0
17 May 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
41
39
0
06 Apr 2022
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani
Paul Hongsuck Seo
Bryan Seybold
Anja Hauth
Santiago Manén
Chen Sun
Cordelia Schmid
CLIP
16
82
0
01 Apr 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
32
24
0
13 Mar 2022
Multi-Query Video Retrieval
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
41
17
0
10 Jan 2022
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
321
117
0
24 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
424
596
0
21 Jul 2020
1