ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16559
  4. Cited By
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning

CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning

22 May 2025
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
    MU
    AAML
ArXivPDFHTML

Papers citing "CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning"

10 / 10 papers shown
Title
NLSR: Neuron-Level Safety Realignment of Large Language Models Against
  Harmful Fine-Tuning
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi
Shunfan Zheng
Linlin Wang
Gerard de Melo
Xiaoling Wang
Liang He
108
8
0
17 Dec 2024
On Evaluating the Durability of Safeguards for Open-Weight LLMs
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi
Boyi Wei
Nicholas Carlini
Yangsibo Huang
Tinghao Xie
Luxi He
Matthew Jagielski
Milad Nasr
Prateek Mittal
Peter Henderson
AAML
88
18
0
10 Dec 2024
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned
  Concepts
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
Hongcheng Gao
Tianyu Pang
Chao Du
Taihang Hu
Zhijie Deng
Min Lin
DiffM
75
10
0
16 Oct 2024
A Closer Look at Machine Unlearning for Large Language Models
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan
Tianyu Pang
Chao Du
Kejiang Chen
Weiming Zhang
Min Lin
MU
90
8
0
10 Oct 2024
An Adversarial Perspective on Machine Unlearning for AI Safety
An Adversarial Perspective on Machine Unlearning for AI Safety
Jakub Łucki
Boyi Wei
Yangsibo Huang
Peter Henderson
F. Tramèr
Javier Rando
MU
AAML
106
38
0
26 Sep 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILM
AAML
52
31
0
28 Jun 2024
Eight Methods to Evaluate Robust Unlearning in LLMs
Eight Methods to Evaluate Robust Unlearning in LLMs
Aengus Lynch
Phillip Guo
Aidan Ewart
Stephen Casper
Dylan Hadfield-Menell
ELM
MU
65
64
0
26 Feb 2024
Fine-tuning can cripple your foundation model; preserving features may
  be the solution
Fine-tuning can cripple your foundation model; preserving features may be the solution
Jishnu Mukhoti
Y. Gal
Philip Torr
P. Dokania
CLL
56
37
0
25 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
133
1,376
0
27 Jul 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
27
426
0
13 Apr 2023
1