ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14438
22
0

S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models

20 May 2025
Yuanbo Fang
Haoze Sun
Jun Liu
Tao Zhang
Zenan Zhou
Weipeng Chen
Xiaofen Xing
Xiangmin Xu
    AuLLM
    ELM
ArXivPDFHTML
Abstract

End-to-end speech large language models ((LLMs)) extend the capabilities of text-based models to directly process and generate audio tokens. However, this often leads to a decline in reasoning and generation performance compared to text input, a phenomenon referred to as intelligence degradation. To systematically evaluate this gap, we propose S2SBench, a benchmark designed to quantify performance degradation in Speech LLMs. It includes diagnostic datasets targeting sentence continuation and commonsense reasoning under audio input. We further introduce a pairwise evaluation protocol based on perplexity differences between plausible and implausible samples to measure degradation relative to text input. We apply S2SBench to analyze the training process of Baichuan-Audio, which further demonstrates the benchmark's effectiveness. All datasets and evaluation code are available atthis https URL.

View on arXiv
@article{fang2025_2505.14438,
  title={ S2SBench: A Benchmark for Quantifying Intelligence Degradation in Speech-to-Speech Large Language Models },
  author={ Yuanbo Fang and Haoze Sun and Jun Liu and Tao Zhang and Zenan Zhou and Weipeng Chen and Xiaofen Xing and Xiangmin Xu },
  journal={arXiv preprint arXiv:2505.14438},
  year={ 2025 }
}
Comments on this paper