ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.14472
  4. Cited By
Detoxifying Large Language Models via Knowledge Editing

Detoxifying Large Language Models via Knowledge Editing

21 March 2024
Meng Wang
Ningyu Zhang
Ziwen Xu
Zekun Xi
Shumin Deng
Yunzhi Yao
Qishen Zhang
Linyi Yang
Jindong Wang
Huajun Chen
    KELM
ArXivPDFHTML

Papers citing "Detoxifying Large Language Models via Knowledge Editing"

15 / 15 papers shown
Title
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model
Qiyuan Deng
X. Bai
Kehai Chen
Yaowei Wang
Liqiang Nie
Min Zhang
OffRL
66
0
0
13 Mar 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
159
0
0
24 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil
Vipul Gupta
Sarkar Snigdha Sarathi Das
R. Passonneau
196
0
0
07 Feb 2025
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Layer-Level Self-Exposure and Patch: Affirmative Token Mitigation for Jailbreak Attack Defense
Yang Ouyang
Hengrui Gu
Shuhang Lin
Wenyue Hua
Jie Peng
B. Kailkhura
Tianlong Chen
Kaixiong Zhou
Kaixiong Zhou
AAML
31
1
0
05 Jan 2025
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
83
1
0
23 Nov 2024
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Extracting and Transferring Abilities For Building Multi-lingual Ability-enhanced Large Language Models
Zhipeng Chen
Liang Song
K. Zhou
Wayne Xin Zhao
Binghui Wang
Weipeng Chen
Ji-Rong Wen
68
0
0
10 Oct 2024
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao
Heng Zhao
Bo Shen
Ali Payani
Fan Yang
Mengnan Du
59
2
0
30 Sep 2024
To Forget or Not? Towards Practical Knowledge Unlearning for Large
  Language Models
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models
Bozhong Tian
Xiaozhuan Liang
Siyuan Cheng
Qingbin Liu
Mengru Wang
Dianbo Sui
Xi Chen
Huajun Chen
Ningyu Zhang
MU
35
6
0
02 Jul 2024
In-Context Editing: Learning Knowledge from Self-Induced Distributions
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi
Bangcheng Yang
Kailin Jiang
Xiaobo Wang
Jiaqi Li
Yifan Zhong
Yaodong Yang
Zilong Zheng
KELM
109
8
0
17 Jun 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
45
36
0
06 May 2024
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge
  Editing through Counterfactual Tasks
Propagation and Pitfalls: Reasoning-based Assessment of Knowledge Editing through Counterfactual Tasks
Wenyue Hua
Jiang Guo
Mingwen Dong
He Zhu
Patrick K. L. Ng
Zhiguo Wang
KELM
76
17
0
31 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
  and Toxicity
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
77
96
0
03 Jan 2024
Knowledge Editing for Large Language Models: A Survey
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Wenlin Yao
KELM
77
133
0
24 Oct 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
117
303
0
19 Sep 2023
Language Anisotropic Cross-Lingual Model Editing
Language Anisotropic Cross-Lingual Model Editing
Yang Xu
Yutai Hou
Wanxiang Che
Min Zhang
KELM
96
24
0
25 May 2022
1