45
0

QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

Main:4 Pages
3 Figures
Bibliography:1 Pages
2 Tables
Abstract

Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73×\times (GMACs) and model size by 83--700×\times, demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.

View on arXiv
@article{biswas2025_2505.14723,
  title={ QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding },
  author={ Subrata Biswas and Mohammad Nur Hossain Khan and Bashima Islam },
  journal={arXiv preprint arXiv:2505.14723},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.