HateBERT: Retraining BERT for Abusive Language Detection in English

23 October 2020

Papers citing "HateBERT: Retraining BERT for Abusive Language Detection in English"

50 / 54 papers shown

Title
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs Sai Krishna Mendu Harish Yenala Aditi Gulati Shanu Kumar Parag Agrawal 36 0 0 04 May 2025
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering Hao Zhuo Yicheng Yang Kewen Peng 30 0 0 21 Apr 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses Rohitash Chandra Aryan Chaudhary Yeshwanth Rayavarapu 51 0 0 27 Mar 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation Shiza Ali Jeremy Blackburn Gianluca Stringhini 66 0 0 24 Feb 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models Seanie Lee Dong Bok Lee Dominik Wagner Minki Kang Haebin Seong Tobias Bocklet Juho Lee Sung Ju Hwang 21 1 0 18 Feb 2025
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement Junyu Lu Kai Ma Kaichun Wang Kelaiti Xiao Roy Ka-Wei Lee Bo Xu Liang Yang Hongfei Lin 53 0 0 10 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet Berk Atil Vipul Gupta Sarkar Snigdha Sarathi Das R. Passonneau 252 0 0 07 Feb 2025
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs Rohitash Chandra Guoxiang Ren G. Houseman 54 0 0 20 Jan 2025
Towards Efficient and Explainable Hate Speech Detection via Model Distillation Paloma Piot Javier Parapar 89 173 0 18 Dec 2024
LLMScan: Causal Scan for LLM Misbehavior Detection Mengdi Zhang Kai Kiat Goh Peixin Zhang Jun Sun Rose Lin Xin Hongyu Zhang 28 0 0 22 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models Seanie Lee Haebin Seong Dong Bok Lee Minki Kang Xiaoyin Chen Dominik Wagner Yoshua Bengio Juho Lee Sung Ju Hwang 70 3 0 02 Oct 2024
Towards Generalized Offensive Language Identification A. Dmonte Tejas Arya Tharindu Ranasinghe Marcos Zampieri 52 3 0 26 Jul 2024
ToVo: Toxicity Taxonomy via Voting Tinh Son Luong Thanh-Thien Le Thang Viet Doan Linh Ngo Van Thien Huu Nguyen Diep Thi-Ngoc Nguyen 36 0 0 21 Jun 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations Preetam Prabhu Srikar Dammu Hayoung Jung Anjali Singh Monojit Choudhury Tanushree Mitra 42 8 0 08 May 2024
MisgenderMender: A Community-Informed Approach to Interventions for Misgendering Tamanna Hossain Sunipa Dev Sameer Singh 40 5 0 23 Apr 2024
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement Paras Sheth Tharindu Kumarage Raha Moraffah Amanat Chadha Huan Liu 40 1 0 17 Apr 2024
Target Span Detection for Implicit Harmful Content Nazanin Jafari James Allan Sheikh Muhammad Sarwar 50 1 0 28 Mar 2024
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales Ayushi Nirmal Amrita Bhattacharjee Paras Sheth Huan Liu AAML 45 10 0 19 Mar 2024
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks Somnath Banerjee Maulindu Sarkar Punyajoy Saha Binny Mathew Animesh Mukherjee TDI 36 0 0 22 Feb 2024
Efficient Models for the Detection of Hate, Abuse and Profanity Christoph Tillmann Aashka Trivedi Bishwaranjan Bhattacharjee VLM 21 0 0 08 Feb 2024
Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse Seungyoon Lee Dahyun Jung Chanjun Park Seolhwa Lee Heu-Jeoung Lim 34 1 0 26 Jan 2024
Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda Richard Kimera Daniela N. Rim Joseph Kirabira Ubong Godwin Udomah Heeyoul Choi AI4MH 33 1 0 25 Jan 2024
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models Jiang Zhang Qiong Wu Yiming Xu Cheng Cao Zheng Du Konstantinos Psounis 36 15 0 13 Dec 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study Maike Zufle Verna Dankers Ivan Titov 47 0 0 16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov 33 11 0 16 Nov 2023
Pre-training LLMs using human-like development data corpus Khushi Bhardwaj Raj Sanjay Shah Sashank Varma 32 6 0 08 Nov 2023
Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi Md. Nishat Raihan Dhiman Goswami Antara Mahmud 53 1 0 19 Sep 2023
LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification K. Chernyshev E. Garanina Duygu Bayram Qiankun Zheng Lukas Edman 13 0 0 08 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation Rahul Madhavan Rishabh Garg Kahini Wadhawan S. Mehta 36 5 0 01 Jun 2023
Detecting Multidimensional Political Incivility on Social Media Sagi Pendzel Nir Lotan Alon Zoizner Einat Minkov 19 1 0 24 May 2023
Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback Shang-ling Hsu Raj Sanjay Shah Prathik Senthil Zahra Ashktorab Casey Dugan Werner Geyer Diyi Yang 52 20 0 15 May 2023
NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset Sana Al-Azzawi Gyorgy Kovács Filip Nilsson Tosin Adewumi Marcus Liwicki 33 6 0 25 Apr 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism Hannah Rose Kirk Wenjie Yin Bertie Vidgen Paul Röttger 24 117 0 07 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer Shanu Kumar Abbaraju Soujanya Sandipan Dandapat Sunayana Sitaram Monojit Choudhury VLM 33 1 0 04 Mar 2023
Towards Agile Text Classifiers for Everyone Maximilian Mozes Jessica Hoffmann Katrin Tomanek Muhamed Kouate Nithum Thain Ann Yuan Tolga Bolukbasi Lucas Dixon 52 13 0 13 Feb 2023
A benchmark for toxic comment classification on Civil Comments dataset Corentin Duchene Henri Jamet Pierre Guillaume Reda Dehak 41 8 0 26 Jan 2023
A Twitter BERT Approach for Offensive Language Detection in Marathi Tanmay Chavan Shantanu Patankar Aditya Kane Omkar Gokhale Raviraj Joshi 41 11 0 20 Dec 2022
SOLD: Sinhala Offensive Language Dataset Tharindu Ranasinghe Isuri Anuradha Damith Premasiri Kanishka Silva Hansi Hettiarachchi Lasitha Uyangodage Marcos Zampieri 41 8 0 01 Dec 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation Tsu-Jui Fu Licheng Yu Ning Zhang Cheng-Yang Fu Jong-Chyi Su William Yang Wang Sean Bell VGen 61 37 0 23 Nov 2022
Dictionary-Assisted Supervised Contrastive Learning Patrick Y. Wu Richard Bonneau Joshua A. Tucker Jonathan Nagler CLIP 35 0 0 27 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing Debora Nozza Dirk Hovy 47 7 0 14 Oct 2022
T5 for Hate Speech, Augmented Data and Ensemble Tosin Adewumi Sana Sabah Sabry Nosheen Abid F. Liwicki Marcus Liwicki 11 10 0 11 Oct 2022
Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection Omkar Gokhale Aditya Kane Shantanu Patankar Tanmay Chavan Raviraj Joshi VLM 37 7 0 09 Oct 2022
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice Mohit Singhal Chen Ling Pujan Paudel Poojitha Thota Nihal Kumarswamy Gianluca Stringhini Shirin Nilizadeh 75 28 0 29 Jun 2022
Detecting Harmful Online Conversational Content towards LGBTQIA+ Individuals Jamell Dacon Harry Shomer Shaylynn Crum-Dacon Jiliang Tang 32 8 0 15 Jun 2022
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments Vitthal Bhandari Poonam Goyal 33 16 0 27 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection Thomas Hartvigsen Saadia Gabriel Hamid Palangi Maarten Sap Dipankar Ray Ece Kamar 38 354 0 17 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer Cagri Toraman Furkan Şahinuç E. Yilmaz 32 60 0 02 Mar 2022
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases Shrimai Prabhumoye Rafal Kocielnik M. Shoeybi Anima Anandkumar Bryan Catanzaro 35 20 0 15 Dec 2021
Combining Textual Features for the Detection of Hateful and Offensive Language Sherzod Hakimov Ralph Ewerth 20 4 0 09 Dec 2021