Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.16120
Cited By
A Data-Centric Approach for Safe and Secure Large Language Models against Threatening and Toxic Content
19 April 2025
Chaima Njeh
Haïfa Nakouri
Fehmi Jaafar
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Data-Centric Approach for Safe and Secure Large Language Models against Threatening and Toxic Content"
5 / 5 papers shown
Title
N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics
Sajad Mousavi
Ricardo Luna Gutierrez
Desik Rengarajan
Vineet Gundecha
Ashwin Ramesh Babu
Avisek Naug
Antonio Guillen-Perez
Soumyendu Sarkar
LRM
HILM
KELM
39
7
0
28 Oct 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
81
394
0
19 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
99
337
0
04 May 2023
Large Language Models Can Self-Improve
Jiaxin Huang
S. Gu
Le Hou
Yuexin Wu
Xuezhi Wang
Hongkun Yu
Jiawei Han
ReLM
AI4MH
LRM
201
612
0
20 Oct 2022
HateBERT: Retraining BERT for Abusive Language Detection in English
Tommaso Caselli
Valerio Basile
Jelena Mitrović
Michael Granitzer
82
373
0
23 Oct 2020
1