ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.16559
  4. Cited By
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning

CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning

22 May 2025
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
    MU
    AAML
ArXivPDFHTML

Papers citing "CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning"

5 / 5 papers shown
Title
NLSR: Neuron-Level Safety Realignment of Large Language Models Against
  Harmful Fine-Tuning
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi
Shunfan Zheng
Linlin Wang
Gerard de Melo
Xiaoling Wang
Liang He
108
8
0
17 Dec 2024
On Evaluating the Durability of Safeguards for Open-Weight LLMs
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi
Boyi Wei
Nicholas Carlini
Yangsibo Huang
Tinghao Xie
Luxi He
Matthew Jagielski
Milad Nasr
Prateek Mittal
Peter Henderson
AAML
86
18
0
10 Dec 2024
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi
Alexander Wei
Eric Wallace
Tony T. Wang
Nika Haghtalab
Jacob Steinhardt
SILM
AAML
52
31
0
28 Jun 2024
Eight Methods to Evaluate Robust Unlearning in LLMs
Eight Methods to Evaluate Robust Unlearning in LLMs
Aengus Lynch
Phillip Guo
Aidan Ewart
Stephen Casper
Dylan Hadfield-Menell
ELM
MU
65
64
0
26 Feb 2024
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
25
426
0
13 Apr 2023
1