Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.04628
Cited By
Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning
8 September 2023
Saurabhchand Bhati
Jesús Villalba
Laureano Moro Velázquez
Thomas Thebaud
Najim Dehak
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning"
4 / 4 papers shown
Title
Cross-Modal Denoising: A Novel Training Paradigm for Enhancing Speech-Image Retrieval
Lifeng Zhou
Yuke Li
Rui Deng
Yuting Yang
Haoqi Zhu
29
0
0
15 Aug 2024
SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model
Yi-Jen Shih
Hsuan-Fu Wang
Heng-Jui Chang
Layne Berry
Hung-yi Lee
David Harwath
VLM
CLIP
46
32
0
03 Oct 2022
Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Saurabhchand Bhati
Jesús Villalba
Piotr Żelasko
Laureano Moro Velázquez
Najim Dehak
SSL
53
22
0
05 Oct 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
320
5,785
0
29 Apr 2021
1