ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.01491
60
0

Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

3 February 2025
Verna Dankers
Vikas Raunak
    VLM
ArXivPDFHTML
Abstract

In this work, we explore how instance-level memorization in the teacher Neural Machine Translation (NMT) model gets inherited by the student model in sequence-level knowledge distillation (SeqKD). We find that despite not directly seeing the original training data, students memorize more than baseline models (models of the same size, trained on the original data) -- 3.4% for exact matches and 57% for extractive memorization -- and show increased hallucination rates. Further, under this SeqKD setting, we also characterize how students behave on specific training data subgroups, such as subgroups with low quality and specific counterfactual memorization (CM) scores, and find that students exhibit amplified denoising on low-quality subgroups. Finally, we propose a modification to SeqKD named Adaptive-SeqKD, which intervenes in SeqKD to reduce memorization and hallucinations. Overall, we recommend caution when applying SeqKD: students inherit both their teachers' superior performance and their fault modes, thereby requiring active monitoring.

View on arXiv
@article{dankers2025_2502.01491,
  title={ Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation },
  author={ Verna Dankers and Vikas Raunak },
  journal={arXiv preprint arXiv:2502.01491},
  year={ 2025 }
}
Comments on this paper