ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.01864
  4. Cited By
Unsupervised Improvement of Audio-Text Cross-Modal Representations
v1v2v3 (latest)

Unsupervised Improvement of Audio-Text Cross-Modal Representations

3 May 2023
Zhepei Wang
Cem Subakan
Krishna Subramani
Junkai Wu
Tiago Tavares
Fabio Ayres
Paris Smaragdis
    SSL
ArXiv (abs)PDFHTML

Papers citing "Unsupervised Improvement of Audio-Text Cross-Modal Representations"

11 / 11 papers shown
Title
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
143
542
0
12 Nov 2022
Learning Representations for New Sound Classes With Continual
  Self-Supervised Learning
Learning Representations for New Sound Classes With Continual Self-Supervised Learning
Zhepei Wang
Cem Subakan
Xilin Jiang
Junkai Wu
Efthymios Tzinis
Mirco Ravanelli
Paris Smaragdis
CLLSSL
111
19
0
15 May 2022
Robust Cross-Modal Representation Learning with Progressive
  Self-Distillation
Robust Cross-Modal Representation Learning with Progressive Self-Distillation
A. Andonian
Shixing Chen
Raffay Hamid
VLM
84
56
0
10 Apr 2022
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
147
908
0
22 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIPVLM
145
273
0
21 Oct 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIPVLM
127
370
0
24 Jun 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
469
3,906
0
11 Feb 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
109
467
0
01 Oct 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
395
18,897
0
13 Feb 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
199
1,084
0
21 Dec 2019
Clotho: An Audio Captioning Dataset
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
109
395
0
21 Oct 2019
1