Beyond Denouncing Hate: Strategies for Countering Implied Biases and
Stereotypes in Language

Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

31 October 2023

Akhila Yerukola

Sarah-Jane Leslie

Papers citing "Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language"

18 / 18 papers shown

Title
Debunking with Dialogue? Exploring AI-Generated Counterspeech to Challenge Conspiracy Theories Mareike Lisker Christina Gottschalk Helena Mihaljević 45 0 0 23 Apr 2025
Northeastern Uni at Multilingual Counterspeech Generation: Enhancing Counter Speech Generation with LLM Alignment through Direct Preference Optimization Sahil Wadhwa Chengtian Xu Haoming Chen Aakash Mahalingam Akankshya Kar Divya Chaudhary 83 0 0 19 Dec 2024
Generics are puzzling. Can language models find the missing piece? Gustavo Cilleruelo Calderón Emily Allaway Barry Haddow Alexandra Birch 79 0 0 15 Dec 2024
Minion: A Technology Probe for Resolving Value Conflicts through Expert-Driven and User-Driven Strategies in AI Companion Applications Xianzhe Fan Qing Xiao Xuhui Zhou Yuran Su Zhicong Lu Maarten Sap Hong Shen 43 0 0 11 Nov 2024
Examining Human-AI Collaboration for Co-Writing Constructive Comments Online Farhana Shahid Maximilian Dittgen Mor Naaman Aditya Vashistha 40 1 0 05 Nov 2024
Towards Effective Counter-Responses: Aligning Human Preferences with Strategies to Combat Online Trolling Huije Lee Hoyun Song Jisu Shin Sukmin Cho SeungYoon Han Jong C. Park 33 0 0 05 Oct 2024
Is Safer Better? The Impact of Guardrails on the Argumentative Strength of LLMs in Hate Speech Countering Helena Bonaldi Greta Damo Nicolás Benjamín Ocampo Elena Cabrio S. Villata Marco Guerini 43 4 0 04 Oct 2024
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies Alina Leidinger Richard Rogers 39 5 0 16 Jul 2024
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails Manish Nagireddy Inkit Padhi Soumya Ghosh P. Sattigeri 46 1 0 08 Jul 2024
Tox-BART: Leveraging Toxicity Attributes for Explanation Generation of Implicit Hate Speech Neemesh Yadav Sarah Masud Vikram Goyal Vikram Goyal Md. Shad Akhtar Tanmoy Chakraborty 36 5 0 06 Jun 2024
MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages Daryna Dementieva N. Babakov Alexander Panchenko 43 7 0 02 Apr 2024
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps Kristina Gligorić Myra Cheng Lucia Zheng Esin Durmus Dan Jurafsky 45 9 0 02 Apr 2024
NLP for Counterspeech against Hate: A Survey and How-To Guide Helena Bonaldi Yi-Ling Chung Gavin Abercrombie Marco Guerini AAML 44 13 0 29 Mar 2024
Hatred Stems from Ignorance! Distillation of the Persuasion Modes in Countering Conversational Hate Speech Ghadi Alyahya Abeer Aldayel 46 2 0 18 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate Jimin Mun Cathy Buerger Jenny T Liang Joshua Garland Maarten Sap 42 10 0 29 Feb 2024
What Does the Bot Say? Opportunities and Risks of Large Language Models in Social Media Bot Detection Shangbin Feng Herun Wan Ningnan Wang Zhaoxuan Tan Minnan Luo Yulia Tsvetkov AAML DeLMO 30 16 0 01 Feb 2024
Understanding Counterspeech for Online Harm Mitigation Yi-Ling Chung Gavin Abercrombie Florence E. Enock Jonathan Bright Verena Rieser 25 16 0 01 Jul 2023
What Makes Online Communities 'Better'? Measuring Values, Consensus, and Conflict across Thousands of Subreddits Galen Cassebeer Weld Amy X. Zhang Tim Althoff 62 31 0 10 Nov 2021