Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.12472
Cited By
HateBERT: Retraining BERT for Abusive Language Detection in English
23 October 2020
Tommaso Caselli
Valerio Basile
Jelena Mitrović
Michael Granitzer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HateBERT: Retraining BERT for Abusive Language Detection in English"
50 / 54 papers shown
Title
Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs
Sai Krishna Mendu
Harish Yenala
Aditi Gulati
Shanu Kumar
Parag Agrawal
36
0
0
04 May 2025
Combating Toxic Language: A Review of LLM-Based Strategies for Software Engineering
Hao Zhuo
Yicheng Yang
Kewen Peng
30
0
0
21 Apr 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
Rohitash Chandra
Aryan Chaudhary
Yeshwanth Rayavarapu
51
0
0
27 Mar 2025
Evolving Hate Speech Online: An Adaptive Framework for Detection and Mitigation
Shiza Ali
Jeremy Blackburn
Gianluca Stringhini
66
0
0
24 Feb 2025
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models
Seanie Lee
Dong Bok Lee
Dominik Wagner
Minki Kang
Haebin Seong
Tobias Bocklet
Juho Lee
Sung Ju Hwang
21
1
0
18 Feb 2025
Is LLM an Overconfident Judge? Unveiling the Capabilities of LLMs in Detecting Offensive Language with Annotation Disagreement
Junyu Lu
Kai Ma
Kaichun Wang
Kelaiti Xiao
Roy Ka-Wei Lee
Bo Xu
Liang Yang
Hongfei Lin
53
0
0
10 Feb 2025
Can LLMs Rank the Harmfulness of Smaller LLMs? We are Not There Yet
Berk Atil
Vipul Gupta
Sarkar Snigdha Sarathi Das
R. Passonneau
252
0
0
07 Feb 2025
Longitudinal Abuse and Sentiment Analysis of Hollywood Movie Dialogues using LLMs
Rohitash Chandra
Guoxiang Ren
G. Houseman
54
0
0
20 Jan 2025
Towards Efficient and Explainable Hate Speech Detection via Model Distillation
Paloma Piot
Javier Parapar
89
173
0
18 Dec 2024
LLMScan: Causal Scan for LLM Misbehavior Detection
Mengdi Zhang
Kai Kiat Goh
Peixin Zhang
Jun Sun
Rose Lin Xin
Hongyu Zhang
28
0
0
22 Oct 2024
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee
Haebin Seong
Dong Bok Lee
Minki Kang
Xiaoyin Chen
Dominik Wagner
Yoshua Bengio
Juho Lee
Sung Ju Hwang
70
3
0
02 Oct 2024
Towards Generalized Offensive Language Identification
A. Dmonte
Tejas Arya
Tharindu Ranasinghe
Marcos Zampieri
52
3
0
26 Jul 2024
ToVo: Toxicity Taxonomy via Voting
Tinh Son Luong
Thanh-Thien Le
Thang Viet Doan
Linh Ngo Van
Thien Huu Nguyen
Diep Thi-Ngoc Nguyen
36
0
0
21 Jun 2024
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu
Hayoung Jung
Anjali Singh
Monojit Choudhury
Tanushree Mitra
42
8
0
08 May 2024
MisgenderMender: A Community-Informed Approach to Interventions for Misgendering
Tamanna Hossain
Sunipa Dev
Sameer Singh
40
5
0
23 Apr 2024
Cross-Platform Hate Speech Detection with Weakly Supervised Causal Disentanglement
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
40
1
0
17 Apr 2024
Target Span Detection for Implicit Harmful Content
Nazanin Jafari
James Allan
Sheikh Muhammad Sarwar
50
1
0
28 Mar 2024
Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales
Ayushi Nirmal
Amrita Bhattacharjee
Paras Sheth
Huan Liu
AAML
45
10
0
19 Mar 2024
InfFeed: Influence Functions as a Feedback to Improve the Performance of Subjective Tasks
Somnath Banerjee
Maulindu Sarkar
Punyajoy Saha
Binny Mathew
Animesh Mukherjee
TDI
36
0
0
22 Feb 2024
Efficient Models for the Detection of Hate, Abuse and Profanity
Christoph Tillmann
Aashka Trivedi
Bishwaranjan Bhattacharjee
VLM
21
0
0
08 Feb 2024
Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse
Seungyoon Lee
Dahyun Jung
Chanjun Park
Seolhwa Lee
Heu-Jeoung Lim
34
1
0
26 Jan 2024
Enhanced Labeling Technique for Reddit Text and Fine-Tuned Longformer Models for Classifying Depression Severity in English and Luganda
Richard Kimera
Daniela N. Rim
Joseph Kirabira
Ubong Godwin Udomah
Heeyoul Choi
AI4MH
33
1
0
25 Jan 2024
Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
Jiang Zhang
Qiong Wu
Yiming Xu
Cheng Cao
Zheng Du
Konstantinos Psounis
36
15
0
13 Dec 2023
Latent Feature-based Data Splits to Improve Generalisation Evaluation: A Hate Speech Detection Case Study
Maike Zufle
Verna Dankers
Ivan Titov
47
0
0
16 Nov 2023
Generative AI for Hate Speech Detection: Evaluation and Findings
Sagi Pendzel
Tomer Wullach
Amir Adler
Einat Minkov
33
11
0
16 Nov 2023
Pre-training LLMs using human-like development data corpus
Khushi Bhardwaj
Raj Sanjay Shah
Sashank Varma
32
6
0
08 Nov 2023
Mixed-Distil-BERT: Code-mixed Language Modeling for Bangla, English, and Hindi
Md. Nishat Raihan
Dhiman Goswami
Antara Mahmud
53
1
0
19 Sep 2023
LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification
K. Chernyshev
E. Garanina
Duygu Bayram
Qiankun Zheng
Lukas Edman
13
0
0
08 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
36
5
0
01 Jun 2023
Detecting Multidimensional Political Incivility on Social Media
Sagi Pendzel
Nir Lotan
Alon Zoizner
Einat Minkov
19
1
0
24 May 2023
Helping the Helper: Supporting Peer Counselors via AI-Empowered Practice and Feedback
Shang-ling Hsu
Raj Sanjay Shah
Prathik Senthil
Zahra Ashktorab
Casey Dugan
Werner Geyer
Diyi Yang
52
20
0
15 May 2023
NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset
Sana Al-Azzawi
Gyorgy Kovács
Filip Nilsson
Tosin Adewumi
Marcus Liwicki
33
6
0
25 Apr 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
24
117
0
07 Mar 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
Shanu Kumar
Abbaraju Soujanya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhury
VLM
33
1
0
04 Mar 2023
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
52
13
0
13 Feb 2023
A benchmark for toxic comment classification on Civil Comments dataset
Corentin Duchene
Henri Jamet
Pierre Guillaume
Reda Dehak
41
8
0
26 Jan 2023
A Twitter BERT Approach for Offensive Language Detection in Marathi
Tanmay Chavan
Shantanu Patankar
Aditya Kane
Omkar Gokhale
Raviraj Joshi
41
11
0
20 Dec 2022
SOLD: Sinhala Offensive Language Dataset
Tharindu Ranasinghe
Isuri Anuradha
Damith Premasiri
Kanishka Silva
Hansi Hettiarachchi
Lasitha Uyangodage
Marcos Zampieri
41
8
0
01 Dec 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
61
37
0
23 Nov 2022
Dictionary-Assisted Supervised Contrastive Learning
Patrick Y. Wu
Richard Bonneau
Joshua A. Tucker
Jonathan Nagler
CLIP
35
0
0
27 Oct 2022
The State of Profanity Obfuscation in Natural Language Processing
Debora Nozza
Dirk Hovy
47
7
0
14 Oct 2022
T5 for Hate Speech, Augmented Data and Ensemble
Tosin Adewumi
Sana Sabah Sabry
Nosheen Abid
F. Liwicki
Marcus Liwicki
11
10
0
11 Oct 2022
Spread Love Not Hate: Undermining the Importance of Hateful Pre-training for Hate Speech Detection
Omkar Gokhale
Aditya Kane
Shantanu Patankar
Tanmay Chavan
Raviraj Joshi
VLM
37
7
0
09 Oct 2022
SoK: Content Moderation in Social Media, from Guidelines to Enforcement, and Research to Practice
Mohit Singhal
Chen Ling
Pujan Paudel
Poojitha Thota
Nihal Kumarswamy
Gianluca Stringhini
Shirin Nilizadeh
75
28
0
29 Jun 2022
Detecting Harmful Online Conversational Content towards LGBTQIA+ Individuals
Jamell Dacon
Harry Shomer
Shaylynn Crum-Dacon
Jiliang Tang
32
8
0
15 Jun 2022
bitsa_nlp@LT-EDI-ACL2022: Leveraging Pretrained Language Models for Detecting Homophobia and Transphobia in Social Media Comments
Vitthal Bhandari
Poonam Goyal
33
16
0
27 Mar 2022
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection
Thomas Hartvigsen
Saadia Gabriel
Hamid Palangi
Maarten Sap
Dipankar Ray
Ece Kamar
38
354
0
17 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman
Furkan Şahinuç
E. Yilmaz
32
60
0
02 Mar 2022
Few-shot Instruction Prompts for Pretrained Language Models to Detect Social Biases
Shrimai Prabhumoye
Rafal Kocielnik
M. Shoeybi
Anima Anandkumar
Bryan Catanzaro
35
20
0
15 Dec 2021
Combining Textual Features for the Detection of Hateful and Offensive Language
Sherzod Hakimov
Ralph Ewerth
20
4
0
09 Dec 2021
1
2
Next