Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.05664
Cited By
Ruddit: Norms of Offensiveness for English Reddit Comments
10 June 2021
Rishav Hada
S. Sudhir
Pushkar Mishra
H. Yannakoudakis
Saif M. Mohammad
Ekaterina Shutova
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ruddit: Norms of Offensiveness for English Reddit Comments"
5 / 5 papers shown
Title
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
14
1
0
18 Feb 2025
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
70
3
0
02 Oct 2024
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
Hakan Inan
Kartikeya Upasani
Jianfeng Chi
Rashi Rungta
Krithika Iyer
...
Michael Tontchev
Qing Hu
Brian Fuller
Davide Testuggine
Madian Khabsa
AI4MH
36
379
0
07 Dec 2023
A Benchmark for Understanding Dialogue Safety in Mental Health Support
Huachuan Qiu
Tong Zhao
Anqi Li
Shuai Zhang
Hongliang He
Zhenzhong Lan
35
10
0
31 Jul 2023
Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts
Ashutosh Baheti
Maarten Sap
Alan Ritter
Mark O. Riedl
21
84
0
26 Aug 2021
1