ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.05984
53
0

Audio-Aware Large Language Models as Judges for Speaking Styles

6 June 2025
Cheng-Han Chiang
Xiaofei Wang
Chung-Ching Lin
Kevin Lin
Linjie Li
Radu Kopetz
Y. Qian
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
    AuLLM
ArXiv (abs)PDFHTML
Main:4 Pages
2 Figures
Bibliography:2 Pages
9 Tables
Appendix:7 Pages
Abstract

Audio-aware large language models (ALLMs) can understand the textual and non-textual information in the audio input. In this paper, we explore using ALLMs as an automatic judge to assess the speaking styles of speeches. We use ALLM judges to evaluate the speeches generated by SLMs on two tasks: voice style instruction following and role-playing. The speaking style we consider includes emotion, volume, speaking pace, word emphasis, pitch control, and non-verbal elements. We use four spoken language models (SLMs) to complete the two tasks and use humans and ALLMs to judge the SLMs' responses. We compare two ALLM judges, GPT-4o-audio and Gemini-2.5-pro, with human evaluation results and show that the agreement between Gemini and human judges is comparable to the agreement between human evaluators. These promising results show that ALLMs can be used as a judge to evaluate SLMs. Our results also reveal that current SLMs, even GPT-4o-audio, still have room for improvement in controlling the speaking style and generating natural dialogues.

View on arXiv
@article{chiang2025_2506.05984,
  title={ Audio-Aware Large Language Models as Judges for Speaking Styles },
  author={ Cheng-Han Chiang and Xiaofei Wang and Chung-Ching Lin and Kevin Lin and Linjie Li and Radu Kopetz and Yao Qian and Zhendong Wang and Zhengyuan Yang and Hung-yi Lee and Lijuan Wang },
  journal={arXiv preprint arXiv:2506.05984},
  year={ 2025 }
}
Comments on this paper