ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.05767
  4. Cited By
Natural Language Supervision for General-Purpose Audio Representations
v1v2 (latest)

Natural Language Supervision for General-Purpose Audio Representations

11 September 2023
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
    AuLLMAI4TS
ArXiv (abs)PDFHTML

Papers citing "Natural Language Supervision for General-Purpose Audio Representations"

15 / 15 papers shown
Title
GLAP: General contrastive audio-text pretraining across domains and languages
GLAP: General contrastive audio-text pretraining across domains and languages
Heinrich Dinkel
Zhiyong Yan
Tianzi Wang
Yongqing Wang
Xingwei Sun
Yadong Niu
Jizhong Liu
Gang Li
Junbo Zhang
Jian Luan
CLIPVLM
27
0
0
12 Jun 2025
Audio Flamingo 2: An Audio-Language Model with Long-Audio Understanding and Expert Reasoning Abilities
Sreyan Ghosh
Zhifeng Kong
Sonal Kumar
S. Sakshi
Jaehyeon Kim
Ming-Yu Liu
Rafael Valle
Dinesh Manocha
Bryan Catanzaro
MLLMAuLLMLRM
136
21
0
06 Mar 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
172
4
0
28 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
193
3
0
10 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
294
18
0
19 Dec 2024
Do Audio-Language Models Understand Linguistic Variations?
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar
Sonal Kumar
Hemant Kumar Giri
Nishit Anand
Ashish Seth
Sreyan Ghosh
Dinesh Manocha
AuLLMVLM
147
1
0
21 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
121
6
0
12 Oct 2024
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Semi-Supervised Cognitive State Classification from Speech with Multi-View Pseudo-Labeling
Yuanchao Li
Zixing Zhang
Jing Han
P. Bell
Catherine Lai
158
1
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
134
1
0
25 Sep 2024
Language-based Audio Moment Retrieval
Language-based Audio Moment Retrieval
Hokuto Munakata
Taichi Nishimura
Shota Nakada
Tatsuya Komatsu
131
2
0
24 Sep 2024
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task
Jozef Coldenhoff
Milos Cernak
105
0
0
21 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
95
2
0
14 Sep 2024
Sequential Contrastive Audio-Visual Learning
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
100
3
0
08 Jul 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
116
13
0
18 Mar 2024
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Listen, Chat, and Remix: Text-Guided Soundscape Remixing for Enhanced Auditory Experience
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
99
5
0
06 Feb 2024
1