Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection

15 November 2021

Maarten Sap

Swabha Swayamdipta

Laura Vianna

Xuhui Zhou

Yejin Choi

Noah A. Smith

ArXiv PDF HTML

Papers citing "Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection"

50 / 165 papers shown

Title
Conflicts in Texts: Data, Implications and Challenges Siyi Liu Dan Roth 139 0 0 28 Apr 2025
Out of Sight Out of Mind, Out of Sight Out of Mind: Measuring Bias in Language Models Against Overlooked Marginalized Groups in Regional Contexts Fatma Elsafoury David Hartmann 29 0 0 17 Apr 2025
MuSeD: A Multimodal Spanish Dataset for Sexism Detection in Social Media Videos Laura De Grazia Pol Pastells Mauro Vázquez Chas Desmond Elliott Danae Sánchez Villegas Mireia Farrús Mariona Taulé 23 0 0 15 Apr 2025
CoKe: Customizable Fine-Grained Story Evaluation via Chain-of-Keyword Rationalization Brihi Joshi Sriram Venkatapathy Mohit Bansal Nanyun Peng Haw-Shiuan Chang LRM 49 0 0 21 Mar 2025
Redefining Toxicity: An Objective and Context-Aware Approach for Stress-Level-Based Detection Sergey Berezin R. Farahbakhsh Noel Crespi 53 0 0 20 Mar 2025
Aligned Probing: Relating Toxic Behavior and Model Internals Andreas Waldis Vagrant Gautam Anne Lauscher Dietrich Klakow Iryna Gurevych 45 0 0 17 Mar 2025
Data Caricatures: On the Representation of African American Language in Pretraining Corpora Nicholas Deas Blake Vente Amith Ananthram Jessica A. Grieser D. Patton Shana Kleiner James Shepard Kathleen McKeown 41 0 0 13 Mar 2025
Lost in Moderation: How Commercial Content Moderation APIs Over- and Under-Moderate Group-Targeted Hate Speech and Linguistic Variations David Hartmann Amin Oueslati Dimitri Staufer Lena Pohlmann Simon Munzert Hendrik Heuer 48 0 0 03 Mar 2025
Beyond Demographics: Fine-tuning Large Language Models to Predict Individuals' Subjective Text Perceptions Matthias Orlikowski Jiaxin Pei Paul Röttger Philipp Cimiano David Jurgens Dirk Hovy 59 1 0 28 Feb 2025
Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech Jonathan Pofcher Christopher Homan Randall Sell Ashiqur R. KhudaBukhsh 96 0 0 13 Feb 2025
AI Alignment at Your Discretion Maarten Buyl Hadi Khalaf C. M. Verdun Lucas Monteiro Paes Caio Vieira Machado Flavio du Pin Calmon 45 0 0 10 Feb 2025
Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR) Stephanie Eckman Bolei Ma Christoph Kern Rob Chew Barbara Plank Frauke Kreuter 41 0 0 12 Jan 2025
Beyond Dataset Creation: Critical View of Annotation Variation and Bias Probing of a Dataset for Online Radical Content Detection Arij Riabi Virginie Mouilleron Menel Mahamdi Wissam Antoun Djamé Seddah 72 0 0 16 Dec 2024
Exploring the Influence of Label Aggregation on Minority Voices: Implications for Dataset Bias and Model Training Mugdha Pandya Nafise Sadat Moosavi Diana Maynard 77 0 0 05 Dec 2024
Examining Human-AI Collaboration for Co-Writing Constructive Comments Online Farhana Shahid Maximilian Dittgen Mor Naaman Aditya Vashistha 35 1 0 05 Nov 2024
Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups Charvi Rastogi Tian Huey Teh Pushkar Mishra Roma Patel Zoe C. Ashwood ... Alicia Parrish Ding Wang Vinodkumar Prabhakaran Lora Aroyo Verena Rieser EGVM 26 1 0 22 Oct 2024
Reducing annotator bias by belief elicitation Terne Sasha Thorn Jakobsen Andreas Bjerre-Nielsen Robert Böhm 41 0 0 21 Oct 2024
Diverging Preferences: When do Annotators Disagree and do Models Know? Michael J.Q. Zhang Zhilin Wang Jena D. Hwang Yi Dong Olivier Delalleau Yejin Choi Eunsol Choi Xiang Ren Valentina Pyatkin 32 7 0 18 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models Eddie L. Ungless Nikolas Vitsakis Zeerak Talat James Garforth Bjorn Ross Arno Onken Atoosa Kasirzadeh Alexandra Birch 28 1 0 17 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors Georgios Chochlakis Alexandros Potamianos Kristina Lerman Shrikanth Narayanan 30 0 0 17 Oct 2024
Personas with Attitudes: Controlling LLMs for Diverse Data Annotation Leon Fröhling Gianluca Demartini Dennis Assenmacher 29 5 0 15 Oct 2024
Human and LLM Biases in Hate Speech Annotations: A Socio-Demographic Analysis of Annotators and Targets Tommaso Giorgi Lorenzo Cima T. Fagni M. Avvenuti S. Cresci 40 9 0 10 Oct 2024
Persona Knowledge-Aligned Prompt Tuning Method for Online Debate Chunkit Chan Cheng Jiayang Xin Liu Yauwai Yim Yuxin Jiang Zheye Deng Haoran Li Yangqiu Song Ginny Y. Wong Simon See 34 0 0 05 Oct 2024
Re-examining Sexism and Misogyny Classification with Annotator Attitudes Aiqi Jiang Nikolas Vitsakis Tanvi Dinkar Gavin Abercrombie Ioannis Konstas 42 1 0 04 Oct 2024
Hate Personified: Investigating the role of LLMs in content moderation Sarah Masud Sahajpreet Singh Viktor Hangya Alexander M. Fraser Tanmoy Chakraborty 30 7 0 03 Oct 2024
Copying style, Extracting value: Illustrators' Perception of AI Style Transfer and its Impact on Creative Labor Julien Porquet Sitong Wang Lydia B. Chilton 34 2 0 25 Sep 2024
Can Unconfident LLM Annotations Be Used for Confident Conclusions? Kristina Gligorić Tijana Zrnic Cinoo Lee Emmanuel J. Candès Dan Jurafsky 72 5 0 27 Aug 2024
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks? Urja Khurana Eric T. Nalisnick Antske Fokkens Swabha Swayamdipta 42 3 0 26 Aug 2024
Rater Cohesion and Quality from a Vicarious Perspective Deepak Pandita Tharindu Cyril Weerasooriya Sujan Dutta Sarah K. K. Luger Tharindu Ranasinghe Ashiqur R. KhudaBukhsh Marcos Zampieri Christopher M. Homan 33 1 0 15 Aug 2024
Helpful assistant or fruitful facilitator? Investigating how personas affect language model behavior Pedro Henrique Luz de Araujo Benjamin Roth 43 3 0 02 Jul 2024
Let Guidelines Guide You: A Prescriptive Guideline-Centered Data Annotation Methodology Federico Ruggeri Eleonora Misino Arianna Muti Katerina Korre Paolo Torroni Alberto Barrón-Cedeño 39 0 0 20 Jun 2024
Disentangling Dialect from Social Bias via Multitask Learning to Improve Fairness Maximilian Spliethover Sai Nikhil Menon Henning Wachsmuth 44 2 0 14 Jun 2024
A Taxonomy of Challenges to Curating Fair Datasets Dora Zhao M. Scheuerman Pooja Chitre Jerone T. A. Andrews Georgia Panagiotidou Shawn Walker Kathleen H. Pine Alice Xiang 44 2 0 10 Jun 2024
Whose Preferences? Differences in Fairness Preferences and Their Impact on the Fairness of AI Utilizing Human Feedback Emilia Agis Lerner Florian E. Dorner Elliott Ash Naman Goel 36 1 0 09 Jun 2024
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art Chen Cecilia Liu Iryna Gurevych Anna Korhonen 33 5 0 06 Jun 2024
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization Jiancong Xiao Ziniu Li Xingyu Xie E. Getzen Cong Fang Qi Long Weijie J. Su 41 12 0 26 May 2024
Estimating the Level of Dialectness Predicts Interannotator Agreement in Multi-dialect Arabic Datasets Amr Keleg Walid Magdy Sharon Goldwater 32 1 0 18 May 2024
Designing NLP Systems That Adapt to Diverse Worldviews Claudiu Creanga Liviu P. Dinu 18 0 0 18 May 2024
Mitigating Text Toxicity with Counterfactual Generation Milan Bhan Jean-Noël Vittaut Nina Achache Victor Legrand N. Chesneau A. Blangero Juliette Murris Marie-Jeanne Lesot MedIm 35 0 0 16 May 2024
The Unseen Targets of Hate -- A Systematic Review of Hateful Communication Datasets Zehui Yu Indira Sen Dennis Assenmacher Mattia Samory Leon Fröhling Christina Dahn Debora Nozza Claudia Wagner 35 5 0 14 May 2024
D3CODE: Disentangling Disagreements in Data across Cultures on Offensiveness Detection and Evaluation Aida Mostafazadeh Davani Mark Díaz Dylan K. Baker Vinodkumar Prabhakaran 34 8 0 16 Apr 2024
What's Mine becomes Yours: Defining, Annotating and Detecting Context-Dependent Paraphrases in News Interview Dialogs Anna Wegmann T. Broek Dong Nguyen 27 1 0 10 Apr 2024
Corpus Considerations for Annotator Modeling and Scaling O. O. Sarumi Béla Neuendorf Joan Plepi Lucie Flek Jorg Schlotterer Charles F Welch 33 1 0 02 Apr 2024
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation Yixin Wan Arjun Subramonian Anaelia Ovalle Zongyu Lin Ashima Suvarna Christina Chance Hritik Bansal Rebecca Pattichis Kai-Wei Chang EGVM 45 27 0 01 Apr 2024
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards Khaoula Chehbouni Megha Roshan Emmanuel Ma Futian Andrew Wei Afaf Taik Jackie CK Cheung G. Farnadi 32 7 0 20 Mar 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations Swapnaja Achintalwar Adriana Alvarado Garcia Ateret Anaby-Tavor Ioana Baldini Sara E. Berger ... Aashka Trivedi Kush R. Varshney Dennis L. Wei Shalisha Witherspooon Marcel Zalmanovici 30 10 0 09 Mar 2024
Don't Blame the Data, Blame the Model: Understanding Noise and Bias When Learning from Subjective Annotations Abhishek Anand Negar Mokhberian Prathyusha Naresh Kumar Anweasha Saha Zihao He Ashwin Rao Fred Morstatter Kristina Lerman 31 6 0 06 Mar 2024
Towards Measuring and Modeling "Culture" in LLMs: A Survey Muhammad Farid Adilazuarda Sagnik Mukherjee Pradhyumna Lavania Siddhant Singh Alham Fikri Aji Jacki OÑeill Ashutosh Modi Monojit Choudhury 64 54 0 05 Mar 2024
Position: Insights from Survey Methodology can Improve Training Data Stephanie Eckman Barbara Plank Frauke Kreuter SyDa 33 3 0 02 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate Jimin Mun Cathy Buerger Jenny T Liang Joshua Garland Maarten Sap 32 10 0 29 Feb 2024