Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.01765
Cited By
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
3 January 2025
Mingjie Li
Wai Man Si
Michael Backes
Yang Zhang
Yisen Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation"
21 / 21 papers shown
Title
Understanding Pre-training and Fine-tuning from Loss Landscape Perspectives
Huanran Chen
Yinpeng Dong
Zeming Wei
Yao Huang
Yichi Zhang
Hang Su
Jun Zhu
MoMe
59
1
0
23 May 2025
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Chengcan Wu
Zhixin Zhang
Zeming Wei
Yihao Zhang
Meng Sun
AAML
50
0
0
22 May 2025
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Biao Yi
Tiansheng Huang
Baolei Zhang
Tong Li
Lihai Nie
Zheli Liu
Li Shen
MU
AAML
51
0
0
22 May 2025
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan
Mengxuan Hu
Ronghang Zhu
Sheng Li
Anil Vullikanti
AAML
48
0
0
11 May 2025
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Yun Wang
Tiansheng Huang
Li Shen
Huanjin Yao
Haotian Luo
Rui Liu
Naiqiang Tan
Jiaxing Huang
Dacheng Tao
AAML
MoMe
CLL
162
3
0
30 Jan 2025
Beyond Interpretability: The Gains of Feature Monosemanticity on Model Robustness
Qi Zhang
Yifei Wang
Jingyi Cui
Xiang Pan
Qi Lei
Stefanie Jegelka
Yisen Wang
AAML
85
1
0
27 Oct 2024
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
AAML
81
32
0
26 Sep 2024
Generated Data with Fake Privacy: Hidden Dangers of Fine-tuning Large Language Models on Generated Data
Atilla Akkus
Mingjie Li
Junjie Chu
Junjie Chu
Michael Backes
Sinem Sav
Sinem Sav
SILM
SyDa
70
3
0
12 Sep 2024
Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Ling Liu
94
29
0
28 May 2024
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models
Chia-Yi Hsu
Yu-Lin Tsai
Chih-Hsun Lin
Pin-Yu Chen
Chia-Mu Yu
Chun-ying Huang
91
45
0
27 May 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
103
92
0
03 Apr 2024
Fight Back Against Jailbreaking via Prompt Adversarial Tuning
Yichuan Mo
Yuji Wang
Zeming Wei
Yisen Wang
AAML
SILM
73
28
0
09 Feb 2024
Vaccine: Perturbation-aware Alignment for Large Language Model
Tiansheng Huang
Sihao Hu
Ling Liu
85
39
0
02 Feb 2024
InferAligner: Inference-Time Alignment for Harmlessness through Cross-Model Guidance
Pengyu Wang
Dong Zhang
Linyang Li
Chenkun Tan
Xinghao Wang
Ke Ren
Botian Jiang
Xipeng Qiu
LLMSV
45
45
0
20 Jan 2024
Fine-tuning can cripple your foundation model; preserving features may be the solution
Jishnu Mukhoti
Y. Gal
Philip Torr
P. Dokania
CLL
63
37
0
25 Aug 2023
Universal and Transferable Adversarial Attacks on Aligned Language Models
Andy Zou
Zifan Wang
Nicholas Carlini
Milad Nasr
J. Zico Kolter
Matt Fredrikson
163
1,376
0
27 Jul 2023
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
458
3,952
0
18 Apr 2021
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning
Armen Aghajanyan
Luke Zettlemoyer
Sonal Gupta
78
549
1
22 Dec 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
SNIP: Single-shot Network Pruning based on Connection Sensitivity
Namhoon Lee
Thalaiyasingam Ajanthan
Philip Torr
VLM
209
1,190
0
04 Oct 2018
Dissecting Contextual Word Embeddings: Architecture and Representation
Matthew E. Peters
Mark Neumann
Luke Zettlemoyer
Wen-tau Yih
88
429
0
27 Aug 2018
1