QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

19 May 2025

Main:4 Pages

3 Figures

Bibliography:1 Pages

2 Tables

Abstract

Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73 $\times$ (GMACs) and model size by 83--700 $\times$ , demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.

View on arXiv

@article{biswas2025_2505.14723,
  title={ QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding },
  author={ Subrata Biswas and Mohammad Nur Hossain Khan and Bashima Islam },
  journal={arXiv preprint arXiv:2505.14723},
  year={ 2025 }
}

Comments on this paper