Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation

Supervised fine-tuning (SFT) using expert demonstrations often suffer from the imitation problem, where the model learns to reproduce the correct responses without \emph{understanding} the underlying rationale. To address this limitation, we propose \textsc{Critique-Guided Distillation (CGD)}, a novel multi-stage framework that integrates teacher model generated \emph{explanatory critiques} and \emph{refined responses} into the SFT process. A student model is then trained to map the triplet of prompt, teacher critique, and its own initial response to the corresponding refined teacher response, thereby learning both \emph{what} to imitate and \emph{why}. Using entropy-based analysis, we show that \textsc{CGD} reduces refinement uncertainty and can be interpreted as a Bayesian posterior update. We perform extensive empirical evaluation of \textsc{CGD}, on variety of benchmark tasks, and demonstrate significant gains on both math (AMC23 +17.5%) and language understanding tasks (MMLU-Pro +6.3%), while successfully mitigating the format drift issues observed in previous critique fine-tuning (CFT) techniques.
View on arXiv@article{kapusuzoglu2025_2505.11628, title={ Critique-Guided Distillation: Improving Supervised Fine-tuning via Better Distillation }, author={ Berkcan Kapusuzoglu and Supriyo Chakraborty and Chia-Hsuan Lee and Sambit Sahu }, journal={arXiv preprint arXiv:2505.11628}, year={ 2025 } }