Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12257
Cited By
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
18 June 2024
Yuetai Li
Zhangchen Xu
Fengqing Jiang
Luyao Niu
D. Sahabandu
Bhaskar Ramasubramanian
Radha Poovendran
SILM
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models"
6 / 6 papers shown
Title
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
Shide Zhou
Kaidi Wang
Ling Shi
Hairu Wang
47
0
0
01 Apr 2025
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Tianlong Chen
AAML
64
4
0
18 Nov 2024
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
131
85
0
14 Feb 2024
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Pengzhou Cheng
Zongru Wu
Wei Du
Haodong Zhao
Wei Lu
Gongshen Liu
SILM
AAML
29
17
0
12 Sep 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
1