How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries

23 February 2024

Papers citing "How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries"

4 / 4 papers shown

Title
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance Somnath Banerjee Avik Halder Rajarshi Mandal Sayan Layek Ian Soboroff Rima Hazra Animesh Mukherjee 54 0 0 17 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models Delong Ran Jinyuan Liu Yichen Gong Jingyi Zheng Xinlei He Tianshuo Cong Anyu Wang ELM 47 10 0 13 Jun 2024
Poisoning Language Models During Instruction Tuning Alexander Wan Eric Wallace Sheng Shen Dan Klein SILM 92 124 0 01 May 2023
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 325 4,077 0 24 May 2022