ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16351
17
0

Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

22 May 2025
Chenxu Guo
Jiachen Lian
Xuanru Zhou
Jinming Zhang
Shuhe Li
Zongli Ye
Hwi Joo Park
Anaisha Das
Z. Ezzes
Jet M J Vonk
Brittany Morin
Rian Bogley
Lisa Wauters
Zachary Miller
M. G. Tempini
Gopala Anumanchipalli
ArXivPDFHTML
Abstract

Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency. Unlike previous models, Dysfluent-WFST operates with upstream encoders like WavLM and requires no additional training. It achieves state-of-the-art performance in both phonetic error rate and dysfluency detection on simulated and real speech data. Our approach is lightweight, interpretable, and effective, demonstrating that explicit modeling of pronunciation behavior in decoding, rather than complex architectures, is key to improving dysfluency processing systems.

View on arXiv
@article{guo2025_2505.16351,
  title={ Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection },
  author={ Chenxu Guo and Jiachen Lian and Xuanru Zhou and Jinming Zhang and Shuhe Li and Zongli Ye and Hwi Joo Park and Anaisha Das and Zoe Ezzes and Jet Vonk and Brittany Morin and Rian Bogley and Lisa Wauters and Zachary Miller and Maria Gorno-Tempini and Gopala Anumanchipalli },
  journal={arXiv preprint arXiv:2505.16351},
  year={ 2025 }
}
Comments on this paper