Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech

1 September 2021

Papers citing "Fight Fire with Fire: Fine-tuning Hate Detectors using Large Samples of Generated Hate Speech"

10 / 10 papers shown

Title
ToxiCraft: A Novel Framework for Synthetic Generation of Harmful Information Zheng Hui Zhaoxiao Guo Hang Zhao Juanyong Duan Congrui Huang 59 7 0 23 Sep 2024
Improving Cross-Domain Hate Speech Generalizability with Emotion Knowledge Shi Yin Hong Susan Gauch 43 2 0 24 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov 38 11 0 16 Nov 2023
Simple synthetic data reduces sycophancy in large language models Jerry W. Wei Da Huang Yifeng Lu Denny Zhou Quoc V. Le 48 70 0 07 Aug 2023
Detecting Multidimensional Political Incivility on Social Media Sagi Pendzel Nir Lotan Alon Zoizner Einat Minkov 19 1 0 24 May 2023
Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection Rabiul Awal Roy Ka-wei Lee Eshaan Tanwar Tanmay Garg Tanmoy Chakraborty 34 27 0 04 Mar 2023
State-of-the-art generalisation research in NLP: A taxonomy and review Dieuwke Hupkes Mario Giulianelli Verna Dankers Mikel Artetxe Yanai Elazar ... Leila Khalatbari Maria Ryskina Rita Frieske Ryan Cotterell Zhijing Jin 133 95 0 06 Oct 2022
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice Mohit Singhal Chen Ling Pujan Paudel Poojitha Thota Nihal Kumarswamy Gianluca Stringhini Shirin Nilizadeh 75 30 0 29 Jun 2022
Going Extreme: Comparative Analysis of Hate Speech in Parler and Gab Abraham Israeli Oren Tsur 40 1 0 27 Jan 2022
Character-level HyperNetworks for Hate Speech Detection Tomer Wullach A. Adler Einat Minkov 31 12 0 11 Nov 2021