Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.10014
Cited By
Safety-Aware Fine-Tuning of Large Language Models
13 October 2024
Hyeong Kyu Choi
Xuefeng Du
Yixuan Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Safety-Aware Fine-Tuning of Large Language Models"
4 / 4 papers shown
Title
Analysing Safety Risks in LLMs Fine-Tuned with Pseudo-Malicious Cyber Security Data
Adel ElZemity
Budi Arief
Shujun Li
29
0
0
15 May 2025
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan
Mengxuan Hu
Ronghang Zhu
Sheng R. Li
Anil Vullikanti
AAML
31
0
0
11 May 2025
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
Daniel Tan
Niels Warncke
Anna Sztyber-Betley
Xuchan Bao
Martín Soto
Nathan Labenz
Owain Evans
AAML
78
9
0
24 Feb 2025
Process Reward Model with Q-Value Rankings
W. Li
Yixuan Li
LRM
59
15
0
15 Oct 2024
1