ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.16750
  4. Cited By
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns

28 January 2025
Xinyue Shen
Yixin Wu
Y. Qu
Michael Backes
Savvas Zannettou
Yang Zhang
ArXivPDFHTML

Papers citing "HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns"

17 / 17 papers shown
Title
PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing
PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing
Yu Yan
Sheng Sun
Zhifei Zheng
Ziji Hao
Teli Liu
Min Liu
AAML
62
0
0
27 May 2025
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models
Ryan Chen
Youngmin Ko
Zeyu Zhang
Catherine Cho
Sunny Chung
Mauro Giuffré
Dennis L. Shung
Bradly C. Stadie
79
0
0
17 May 2025
Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models
Echoes of Power: Investigating Geopolitical Bias in US and China Large Language Models
Andre G. C. Pacheco
Athus Cavalini
Giovanni Comarela
56
1
0
20 Mar 2025
Peering Behind the Shield: Guardrail Identification in Large Language Models
Peering Behind the Shield: Guardrail Identification in Large Language Models
Ziqing Yang
Yixin Wu
Rui Wen
Michael Backes
Yang Zhang
68
1
0
03 Feb 2025
Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in
  Large Language Models
Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models
Nishant Vishwamitra
Keyan Guo
Farhan Tajwar Romit
Isabelle Ondracek
Long Cheng
Ziming Zhao
Hongxin Hu
29
13
0
22 Dec 2023
Baichuan 2: Open Large-scale Language Models
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
129
731
0
19 Sep 2023
No Easy Way Out: the Effectiveness of Deplatforming an Extremist Forum
  to Suppress Hate and Harassment
No Easy Way Out: the Effectiveness of Deplatforming an Extremist Forum to Suppress Hate and Harassment
Anh V. Vu
Alice Hutchings
Ross Anderson
16
9
0
14 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
403
13,788
0
15 Mar 2023
I Know What You Trained Last Summer: A Survey on Stealing Machine
  Learning Models and Defences
I Know What You Trained Last Summer: A Survey on Stealing Machine Learning Models and Defences
Daryna Oliynyk
Rudolf Mayer
Andreas Rauber
84
109
0
16 Jun 2022
Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of
  Generated Hate Speech
Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech
Tomer Wullach
A. Adler
Einat Minkov
19
41
0
01 Sep 2021
TweetBLM: A Hate Speech Dataset and Analysis of Black Lives
  Matter-related Microblogs on Twitter
TweetBLM: A Hate Speech Dataset and Analysis of Black Lives Matter-related Microblogs on Twitter
Sumit Kumar
Raj Ratn Pranesh
32
18
0
27 Aug 2021
HateCheck: Functional Tests for Hate Speech Detection Models
HateCheck: Functional Tests for Hate Speech Detection Models
Paul Röttger
B. Vidgen
Dong Nguyen
Zeerak Talat
Helen Z. Margetts
J. Pierrehumbert
47
263
0
31 Dec 2020
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on
  Text Classification and Entailment
Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment
Di Jin
Zhijing Jin
Qiufeng Wang
Peter Szolovits
SILM
AAML
97
1,064
0
27 Jul 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
372
24,160
0
26 Jul 2019
TextBugger: Generating Adversarial Text Against Real-world Applications
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILM
AAML
136
731
0
13 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
854
93,936
0
11 Oct 2018
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Deceiving Google's Perspective API Built for Detecting Toxic Comments
Hossein Hosseini
Sreeram Kannan
Baosen Zhang
Radha Poovendran
AAML
26
328
0
27 Feb 2017
1