ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05767
  4. Cited By
Natural Language Supervision for General-Purpose Audio Representations

Natural Language Supervision for General-Purpose Audio Representations

11 September 2023
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
    AuLLM
    AI4TS
ArXivPDFHTML

Papers citing "Natural Language Supervision for General-Purpose Audio Representations"

14 / 14 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
Do Audio-Language Models Understand Linguistic Variations?
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar
Sonal Kumar
Hemant Kumar Giri
Nishit Anand
Ashish Seth
Sreyan Ghosh
Dinesh Manocha
AuLLM
VLM
55
1
0
21 Oct 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
29
0
0
25 Sep 2024
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li
Zixing Zhang
Jing Han
P. Bell
Catherine Lai
77
0
0
25 Sep 2024
Language-based Audio Moment Retrieval
Language-based Audio Moment Retrieval
Hokuto Munakata
Taichi Nishimura
Shota Nakada
Tatsuya Komatsu
40
1
0
24 Sep 2024
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Jozef Coldenhoff
Milos Cernak
41
0
0
21 Sep 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Kohei Saijo
Janek Ebbers
François Germain
Sameer Khurana
G. Wichern
Jonathan Le Roux
44
1
0
20 Sep 2024
Sequential Contrastive Audio-Visual Learning
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
44
2
0
08 Jul 2024
Bridging Language Gaps in Audio-Text Retrieval
Bridging Language Gaps in Audio-Text Retrieval
Zhiyong Yan
Heinrich Dinkel
Yongqing Wang
Jizhong Liu
Junbo Zhang
Yujun Wang
Bin Wang
VLM
39
4
0
11 Jun 2024
Correlation of Fréchet Audio Distance With Human Perception of
  Environmental Audio Is Embedding Dependant
Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant
Modan Tailleur
Junwon Lee
Mathieu Lagrange
Keunwoo Choi
Laurie M. Heller
Keisuke Imoto
Yuki Okamoto
30
10
0
26 Mar 2024
Audio Retrieval with WavText5K and CLAP Training
Audio Retrieval with WavText5K and CLAP Training
Soham Deshmukh
Benjamin Elizalde
Huaming Wang
3DV
CLIP
124
50
0
28 Sep 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
127
264
0
02 Feb 2022
Identifying Actions for Sound Event Classification
Identifying Actions for Sound Event Classification
Benjamin Elizalde
Radu Revutchi
Samarjit Das
Bhiksha Raj
Ian Lane
Laurie M. Heller
19
5
0
26 Apr 2021
1