Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.15302
Cited By
How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
23 February 2024
Somnath Banerjee
Sayan Layek
Rima Hazra
Animesh Mukherjee
Re-assign community
ArXiv
PDF
HTML
Papers citing
"How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries"
4 / 4 papers shown
Title
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance
Somnath Banerjee
Avik Halder
Rajarshi Mandal
Sayan Layek
Ian Soboroff
Rima Hazra
Animesh Mukherjee
54
0
0
17 Jun 2024
JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models
Delong Ran
Jinyuan Liu
Yichen Gong
Jingyi Zheng
Xinlei He
Tianshuo Cong
Anyu Wang
ELM
47
10
0
13 Jun 2024
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
325
4,077
0
24 May 2022
1