ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.01765
  4. Cited By
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation

3 January 2025
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
ArXivPDFHTML

Papers citing "SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation"

21 / 21 papers shown
Title
Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Huanran Chen
Yinpeng Dong
Zeming Wei
Yao Huang
Yichi Zhang
Hang Su
Jun Zhu
MoMe
59
0
0
23 May 2025
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Chengcan Wu
Zhixin Zhang
Zeming Wei
Yihao Zhang
Meng Sun
AAML
50
0
0
22 May 2025
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
MU
AAML
45
0
0
22 May 2025
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan
Mengxuan Hu
Ronghang Zhu
Sheng Li
Anil Vullikanti
AAML
45
0
0
11 May 2025
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Yun Wang
Tiansheng Huang
Li Shen
Huanjin Yao
Haotian Luo
Rui Liu
Naiqiang Tan
Jiaxing Huang
Dacheng Tao
AAML
MoMe
CLL
159
3
0
30 Jan 2025
Beyond Interpretability: The Gains of Feature Monosemanticity on Model
  Robustness
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Qi Zhang
Yifei Wang
Jingyi Cui
Xiang Pan
Qi Lei
Stefanie Jegelka
Yisen Wang
AAML
83
1
0
27 Oct 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A
  Survey
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
76
32
0
26 Sep 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILM
SyDa
70
3
0
12 Sep 2024
Lazy Safety Alignment for Large Language Models against Harmful
  Fine-tuning
Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
94
29
0
28 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
91
45
0
27 May 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
98
92
0
03 Apr 2024
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Yichuan Mo
Yuji Wang
Zeming Wei
Yisen Wang
AAML
SILM
65
28
0
09 Feb 2024
Vaccine: Perturbation-aware Alignment for Large Language Model
Vaccine: Perturbation-aware Alignment for Large Language Model
Tiansheng Huang
Sihao Hu
Ling Liu
85
39
0
02 Feb 2024
InferAligner: Inference-Time Alignment for Harmlessness through
  Cross-Model Guidance
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
Pengyu Wang
Dong Zhang
Linyang Li
Chenkun Tan
Xinghao Wang
Ke Ren
Botian Jiang
Xipeng Qiu
LLMSV
43
45
0
20 Jan 2024
Fine-tuning can cripple your foundation model; preserving features may
  be the solution
Fine-tuning can cripple your foundation model; preserving features may be the solution
Jishnu Mukhoti
Y. Gal
Philip Torr
P. Dokania
CLL
61
37
0
25 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language
  Models
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
160
1,376
0
27 Jul 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
448
3,952
0
18 Apr 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model
  Fine-Tuning
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
75
549
1
22 Dec 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
467
41,106
0
28 May 2020
SNIP: Single-shot Network Pruning based on Connection Sensitivity
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
207
1,190
0
04 Oct 2018
Dissecting Contextual Word Embeddings: Architecture and Representation
Dissecting Contextual Word Embeddings: Architecture and Representation
Matthew E. Peters
Mark Neumann
Luke Zettlemoyer
Wen-tau Yih
77
429
0
27 Aug 2018
1