
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training
Papers citing "Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training"
50 / 72 papers shown
Title |
---|
![]() Detoxifying Large Language Models via Knowledge Editing Meng Wang Ningyu Zhang Ziwen Xu Zekun Xi Shumin Deng Yunzhi Yao Qishen Zhang Linyi Yang Jindong Wang Huajun Chen |
![]() Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations Hakan Inan Kartikeya Upasani Jianfeng Chi Rashi Rungta Krithika Iyer ...Michael Tontchev Qing Hu Brian Fuller Davide Testuggine Madian Khabsa |