84

Distillation-Guided Structural Transfer for Continual Learning Beyond Sparse Distributed Memory

Huiyan Xue
Xuming Ran
Yaxin Li
Qi Xu
Enhui Li
Yi Xu
Qiang Zhang
Main:7 Pages
7 Figures
Bibliography:2 Pages
5 Tables
Appendix:2 Pages
Abstract

Sparse neural systems are gaining traction for efficient continual learning due to their modularity and low interference. Architectures such as Sparse Distributed Memory Multi-Layer Perceptrons (SDMLP) construct task-specific subnetworks via Top-K activation and have shown resilience against catastrophic forgetting. However, their rigid modularity limits cross-task knowledge reuse and leads to performance degradation under high sparsity. We propose Selective Subnetwork Distillation (SSD), a structurally guided continual learning framework that treats distillation not as a regularizer but as a topology-aligned information conduit. SSD identifies neurons with high activation frequency and selectively distills knowledge within previous Top-K subnetworks and output logits, without requiring replay or task labels. This enables structural realignment while preserving sparse modularity. Experiments on Split CIFAR-10, CIFAR-100, and MNIST demonstrate that SSD improves accuracy, retention, and representation coverage, offering a structurally grounded solution for sparse continual learning.

View on arXiv
Comments on this paper