ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.15821
  4. Cited By
The effect of fine-tuning on language model toxicity

The effect of fine-tuning on language model toxicity

21 October 2024
Will Hawkins
Brent Mittelstadt
Chris Russell
ArXiv (abs)PDFHTML

Papers citing "The effect of fine-tuning on language model toxicity"

7 / 7 papers shown
Title
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Safety Subspaces are Not Distinct: A Fine-Tuning Case Study
Kaustubh Ponkshe
Shaan Shah
Raghav Singhal
Praneeth Vepakomma
106
0
0
20 May 2025
Fine-tuning Language Models for Factuality
Fine-tuning Language Models for Factuality
Katherine Tian
Eric Mitchell
Huaxiu Yao
Christopher D. Manning
Chelsea Finn
KELMHILMSyDa
73
179
0
14 Nov 2023
Removing RLHF Protections in GPT-4 via Fine-Tuning
Removing RLHF Protections in GPT-4 via Fine-Tuning
Qiusi Zhan
Richard Fang
R. Bindu
Akul Gupta
Tatsunori Hashimoto
Daniel Kang
MUAAML
58
101
0
09 Nov 2023
The Expressive Power of Low-Rank Adaptation
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng
Kangwook Lee
96
62
0
26 Oct 2023
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable
  information?
Does fine-tuning GPT-3 with the OpenAI API leak personally-identifiable information?
A. Sun
Eliott Zemour
Arushi Saxena
Udith Vaidyanathan
Eric Lin
Christian Lau
Vaikkunth Mugunthan
SILM
85
21
0
31 Jul 2023
On the Effectiveness of Parameter-Efficient Fine-Tuning
On the Effectiveness of Parameter-Efficient Fine-Tuning
Z. Fu
Haoran Yang
Anthony Man-Cho So
Wai Lam
Lidong Bing
Nigel Collier
68
158
0
28 Nov 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
880
12,973
0
04 Mar 2022
1