Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.15821
Cited By
The effect of fine-tuning on language model toxicity
21 October 2024
Will Hawkins
Brent Mittelstadt
Chris Russell
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The effect of fine-tuning on language model toxicity"
7 / 7 papers shown
Title
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Kaustubh Ponkshe
Shaan Shah
Raghav Singhal
Praneeth Vepakomma
106
0
0
20 May 2025
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELM
HILM
SyDa
73
179
0
14 Nov 2023
Removing RLHF Protections in GPT-4 via Fine-Tuning
Qiusi Zhan
Richard Fang
R. Bindu
Akul Gupta
Tatsunori Hashimoto
Daniel Kang
MU
AAML
56
101
0
09 Nov 2023
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng
Kangwook Lee
96
62
0
26 Oct 2023
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?
A. Sun
Eliott Zemour
Arushi Saxena
Udith Vaidyanathan
Eric Lin
Christian Lau
Vaikkunth Mugunthan
SILM
85
21
0
31 Jul 2023
On the Effectiveness of Parameter-Efficient Fine-Tuning
Z. Fu
Haoran Yang
Anthony Man-Cho So
Wai Lam
Lidong Bing
Nigel Collier
68
158
0
28 Nov 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
877
12,973
0
04 Mar 2022
1