Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.16369
Cited By
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance
22 May 2025
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance"
26 / 26 papers shown
Title
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Guangzhi Sun
Lu Lu
Yansen Wang
Chao Zhang
AuLLM
101
9
0
27 Nov 2024
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
Dingdong Wang
Mingyu Cui
Dongchao Yang
Xueyuan Chen
Helen Meng
28
2
0
13 Nov 2024
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
72
122
0
17 Sep 2024
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
40
16
0
20 Jun 2024
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
59
5
0
11 Jun 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
58
315
0
14 Nov 2023
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
37
21
0
23 Aug 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
58
28
0
07 Jun 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
60
278
0
18 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
93
3,515
0
06 Dec 2022
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
51
31
0
26 Oct 2022
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Gasser Elbanna
Neil Scheidwasser
M. Kegler
P. Beckmann
Karl El Hajal
Milos Cernak
SSL
41
23
0
24 Jun 2022
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Yuan Gong
Jingbo Yu
James R. Glass
30
39
0
06 May 2022
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
49
103
0
06 Mar 2022
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
62
845
0
07 Feb 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
164
1,794
0
26 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
103
2,879
0
14 Jun 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
64
910
0
03 May 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
41
446
0
01 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
30
5,677
0
20 Jun 2020
Speech Model Pre-training for End-to-End Spoken Language Understanding
Loren Lugosch
Mirco Ravanelli
Patrick Ignoto
Vikrant Singh Tomar
Yoshua Bengio
SyDa
AuLLM
39
349
0
07 Apr 2019
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Curtis Hawthorne
Andriy Stasyuk
Adam Roberts
Ian Simon
Cheng-Zhi Anna Huang
Sander Dieleman
Erich Elsen
Jesse Engel
Douglas Eck
59
447
0
29 Oct 2018
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Eduardo Fonseca
Manoj Plakal
F. Font
D. Ellis
Xavier Favory
Jordi Pons
Xavier Serra
25
147
0
26 Jul 2018
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Pete Warden
34
1,599
0
09 Apr 2018
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Jesse Engel
Cinjon Resnick
Adam Roberts
Sander Dieleman
Douglas Eck
Karen Simonyan
Mohammad Norouzi
74
618
0
05 Apr 2017
FMA: A Dataset For Music Analysis
M. Defferrard
Kirell Benzi
P. Vandergheynst
Xavier Bresson
35
434
0
06 Dec 2016
1