ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.06040
52
2

Mitigating Memorization in LLMs using Activation Steering

8 March 2025
Manan Suri
Nishit Anand
Amisha Bhaskar
    LLMSV
ArXivPDFHTML
Abstract

The memorization of training data by Large Language Models (LLMs) poses significant risks, including privacy leaks and the regurgitation of copyrighted content. Activation steering, a technique that directly intervenes in model activations, has emerged as a promising approach for manipulating LLMs. In this work, we explore the effectiveness of activation steering in reducing memorization while preserving generalization capabilities. We conduct empirical evaluations using a controlled memorization benchmark of literary material and demonstrate that our method successfully suppresses memorized content with minimal degradation in model performance in Gemma. Additionally, we analyze the trade-offs between suppression effectiveness and linguistic fluency, highlighting the advantages and limitations of activation-based interventions. Our findings contribute to ongoing efforts in developing safer and more privacy-preserving LLMs by providing a practical and efficient mechanism to mitigate unintended memorization.

View on arXiv
@article{suri2025_2503.06040,
  title={ Mitigating Memorization in LLMs using Activation Steering },
  author={ Manan Suri and Nishit Anand and Amisha Bhaskar },
  journal={arXiv preprint arXiv:2503.06040},
  year={ 2025 }
}
Comments on this paper