v1v2 (latest)

Unveiling and Addressing Pseudo Forgetting in Large Language Models

18 November 2024

Main:9 Pages

12 Figures

Bibliography:3 Pages

10 Tables

Appendix:5 Pages

Abstract

Although substantial efforts have been made to mitigate catastrophic forgetting in continual learning, the intrinsic mechanisms are not well understood. In this work, we demonstrate the existence of "pseudo forgetting": the performance degradation on previous tasks is not attributed to a loss of capabilities, but rather to the failure of the instructions to activate the appropriate model abilities. We show that the model's performance on previous tasks can be restored through two simple interventions: (1) providing partial external correct rationale, and (2) appending semantically meaningless suffixes to the original instructions, to guide the generation of correct rationales. Through empirical analysis of the internal mechanisms governing rationale generation, we reveal that models exhibiting pseudo forgetting show reduced instruction dependence during rationale generation, leading to suboptimal activation of their inherent capabilities. Based on this insight, we propose Rationale-Guidance Difficulty based Replay (RGD-R) framework that dynamically allocates replay data based on the model's ability to correctly leverage the intrinsic capabilities. Experimental results demonstrate that RGD-R effectively mitigates pseudo forgetting while maintaining model plasticity.

View on arXiv

@article{sun2025_2411.11932,
  title={ Unveiling and Addressing Pseudo Forgetting in Large Language Models },
  author={ Huashan Sun and Yizhe Yang and Yinghao Li and Jiawei Li and Yang Gao },
  journal={arXiv preprint arXiv:2411.11932},
  year={ 2025 }
}

Comments on this paper