ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16369
  4. Cited By
X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance

22 May 2025
Junbo Zhang
Heinrich Dinkel
Yadong Niu
Chenyu Liu
Si Cheng
Anbei Zhao
Jian Luan
ArXivPDFHTML

Papers citing "X-ARES: A Comprehensive Framework for Assessing Audio Encoder Performance"

26 / 26 papers shown
Title
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and
  Generation
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Guangzhi Sun
Lu Lu
Yansen Wang
Chao Zhang
AuLLM
101
9
0
27 Nov 2024
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks
  with Large Language Models
A Comparative Study of Discrete Speech Tokens for Semantic-Related Tasks with Large Language Models
Dingdong Wang
Mingyu Cui
Dongchao Yang
Xueyuan Chen
Helen Meng
28
2
0
13 Nov 2024
Moshi: a speech-text foundation model for real-time dialogue
Moshi: a speech-text foundation model for real-time dialogue
Alexandre Défossez
Laurent Mazaré
Manu Orsini
Amélie Royer
P. Pérez
Hervé Jégou
Edouard Grave
Neil Zeghidour
AuLLM
72
122
0
17 Sep 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
40
16
0
20 Jun 2024
Scaling up masked audio encoder learning for general audio
  classification
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
59
5
0
11 Jun 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
58
315
0
14 Nov 2023
CED: Consistent ensemble distillation for audio tagging
CED: Consistent ensemble distillation for audio tagging
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
37
21
0
23 Aug 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
58
28
0
07 Jun 2023
BEATs: Audio Pre-Training with Acoustic Tokenizers
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
60
278
0
18 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
93
3,515
0
06 Dec 2022
Masked Modeling Duo: Learning Representations by Encouraging Both
  Networks to Model the Input
Masked Modeling Duo: Learning Representations by Encouraging Both Networks to Model the Input
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
51
31
0
26 Oct 2022
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Gasser Elbanna
Neil Scheidwasser
M. Kegler
P. Beckmann
Karl El Hajal
Milos Cernak
SSL
41
23
0
24 Jun 2022
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition
Yuan Gong
Jingbo Yu
James R. Glass
30
39
0
06 May 2022
HEAR: Holistic Evaluation of Audio Representations
HEAR: Holistic Evaluation of Audio Representations
Joseph P. Turian
Jordie Shier
H. Khan
Bhiksha Raj
Björn W. Schuller
...
P. Esling
Pranay Manocha
Shinji Watanabe
Zeyu Jin
Yonatan Bisk
49
103
0
06 Mar 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSL
VLM
ViT
62
845
0
07 Feb 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
164
1,794
0
26 Oct 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
103
2,879
0
14 Jun 2021
SUPERB: Speech processing Universal PERformance Benchmark
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
64
910
0
03 May 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
41
446
0
01 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech
  Representations
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
30
5,677
0
20 Jun 2020
Speech Model Pre-training for End-to-End Spoken Language Understanding
Speech Model Pre-training for End-to-End Spoken Language Understanding
Loren Lugosch
Mirco Ravanelli
Patrick Ignoto
Vikrant Singh Tomar
Yoshua Bengio
SyDa
AuLLM
39
349
0
07 Apr 2019
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO
  Dataset
Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Curtis Hawthorne
Andriy Stasyuk
Adam Roberts
Ian Simon
Cheng-Zhi Anna Huang
Sander Dieleman
Erich Elsen
Jesse Engel
Douglas Eck
59
447
0
29 Oct 2018
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task
  Description, Dataset, and Baseline
General-purpose Tagging of Freesound Audio with AudioSet Labels: Task Description, Dataset, and Baseline
Eduardo Fonseca
Manoj Plakal
F. Font
D. Ellis
Xavier Favory
Jordi Pons
Xavier Serra
25
147
0
26 Jul 2018
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition
Pete Warden
34
1,599
0
09 Apr 2018
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Jesse Engel
Cinjon Resnick
Adam Roberts
Sander Dieleman
Douglas Eck
Karen Simonyan
Mohammad Norouzi
74
618
0
05 Apr 2017
FMA: A Dataset For Music Analysis
FMA: A Dataset For Music Analysis
M. Defferrard
Kirell Benzi
P. Vandergheynst
Xavier Bresson
35
434
0
06 Dec 2016
1