Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification

11 March 2019

Daniel Borkan

Lucas Dixon

Jeffrey Scott Sorensen

Nithum Thain

Lucy Vasserman

ArXiv PDF HTML

Papers citing "Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification"

50 / 115 papers shown

Title
Enforcing Fairness Where It Matters: An Approach Based on Difference-of-Convex Constraints Yutian He Yankun Huang Yao Yao Qihang Lin FaML 9 0 0 18 May 2025
Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification Miaoyun Zhao Qiang Zhang C. Li 31 0 0 11 May 2025
Teaching Models to Understand (but not Generate) High-risk Data Ryan Yixiang Wang Matthew Finlayson Luca Soldaini Swabha Swayamdipta Robin Jia 154 0 0 05 May 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels Luke M. Guerdan Solon Barocas Kenneth Holstein Hanna M. Wallach Zhiwei Steven Wu Alexandra Chouldechova ALM ELM 257 0 0 13 Mar 2025
Out-of-Distribution Detection using Synthetic Data Generation Momin Abbas Muneeza Azmat R. Horesh Mikhail Yurochkin 47 1 0 05 Feb 2025
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification Tom A. Lamb Adam Davies Alasdair Paren Philip Torr Francesco Pinto 52 0 0 30 Oct 2024
Compositional Risk Minimization Divyat Mahajan Mohammad Pezeshki Ioannis Mitliagkas Kartik Ahuja Pascal Vincent Pascal Vincent 26 3 0 08 Oct 2024
Identity-related Speech Suppression in Generative AI Content Moderation Oghenefejiro Isaacs Anigboro Charlie M. Crawford Danaë Metaxa Sorelle A. Friedler Sorelle A. Friedler 26 0 0 09 Sep 2024
Towards Generalized Offensive Language Identification A. Dmonte Tejas Arya Tharindu Ranasinghe Marcos Zampieri 52 3 0 26 Jul 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs S. Kadhe Farhan Ahmed Dennis Wei Nathalie Baracaldo Inkit Padhi MoMe MU 28 7 0 17 Jun 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks J. Renzullo Pemma Reiter Westley Weimer Stephanie Forrest 42 1 0 08 May 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models Luiza Amador Pozzobon Patrick Lewis Sara Hooker Beyza Ermis 38 7 0 06 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention Bhavya Vasudeva Puneesh Deora Christos Thrampoulidis 37 13 0 08 Feb 2024
Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts Xiruo Ding Zhecheng Sheng Brian Hur Feng Chen Serguei V. S. Pakhomov Trevor Cohen OOD 23 0 0 09 Dec 2023
Model Merging by Uncertainty-Based Gradient Matching Nico Daheim Thomas Möllenhoff E. Ponti Iryna Gurevych Mohammad Emtiyaz Khan MoMe FedML 32 44 0 19 Oct 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models Luiza Amador Pozzobon Beyza Ermis Patrick Lewis Sara Hooker 36 20 0 11 Oct 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI Mahyar Abbasian Elahe Khatibi Iman Azimi David Oniani Zahra Shakeri Hossein Abad ... Bryant Lin Olivier Gevaert Li-Jia Li Ramesh C. Jain Amir M. Rahmani LM&MA ELM AI4MH 43 66 0 21 Sep 2023
Bias Amplification Enhances Minority Group Performance Gaotang Li Jiarui Liu Wei Hu 28 5 0 13 Sep 2023
Zero-Shot Robustification of Zero-Shot Models Dyah Adila Changho Shin Lin Cai Frederic Sala 43 19 0 08 Sep 2023
Thesis Distillation: Investigating The Impact of Bias in NLP Models on Hate Speech Detection Fatma Elsafoury 29 3 0 31 Aug 2023
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions Reem I. Masoud Ziquan Liu Martin Ferianc Philip C. Treleaven Miguel R. D. Rodrigues 27 50 0 25 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation Xinshuo Hu Dongfang Li Baotian Hu Zihao Zheng Zhenyu Liu Hao Fei KELM MU 35 26 0 16 Aug 2023
LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification K. Chernyshev E. Garanina Duygu Bayram Qiankun Zheng Lukas Edman 13 0 0 08 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations Lifan Yuan Yangyi Chen Ganqu Cui Hongcheng Gao Fangyuan Zou Xingyi Cheng Heng Ji Zhiyuan Liu Maosong Sun 39 73 0 07 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation Carolina Zheng Claudia Shi Keyon Vafa Amir Feder David M. Blei OOD 38 8 0 31 May 2023
Rectifying Group Irregularities in Explanations for Distribution Shift Adam Stein Yinjun Wu Eric Wong Mayur Naik 37 1 0 25 May 2023
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis Oscar Chew Hsuan-Tien Lin Kai-Wei Chang Kuan-Hao Huang 38 5 0 23 May 2023
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization Ting Wu Rui Zheng Tao Gui Qi Zhang Xuanjing Huang 51 2 0 20 May 2023
PaLM 2 Technical Report Rohan Anil Andrew M. Dai Orhan Firat Melvin Johnson Dmitry Lepikhin ... Ce Zheng Wei Zhou Denny Zhou Slav Petrov Yonghui Wu ReLM LRM 125 1,152 0 17 May 2023
Addressing Biases in the Texts using an End-to-End Pipeline Approach Shaina Raza Syed Raza Bashir Sneha Urooj Qamar 38 0 0 13 Mar 2023
Distributionally Robust Optimization with Probabilistic Group Soumya Suvra Ghosal Yixuan Li OOD 13 7 0 10 Mar 2023
Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets Irina Bejan Artem Sokolov Katja Filippova TDI 32 9 0 27 Feb 2023
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection Soumyajit Gupta Sooyong Lee Maria De-Arteaga Matthew Lease 27 13 0 14 Feb 2023
Towards Agile Text Classifiers for Everyone Maximilian Mozes Jessica Hoffmann Katrin Tomanek Muhamed Kouate Nithum Thain Ann Yuan Tolga Bolukbasi Lucas Dixon 52 13 0 13 Feb 2023
A benchmark for toxic comment classification on Civil Comments dataset Corentin Duchene Henri Jamet Pierre Guillaume Reda Dehak 35 8 0 26 Jan 2023
ViHOS: Hate Speech Spans Detection for Vietnamese Phu Gia Hoang Canh Duc Luu K. Tran Kiet Van Nguyen Ngan Luu-Thuy Nguyen 31 20 0 24 Jan 2023
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting P. Sattigeri S. Ghosh Inkit Padhi Pierre L. Dognin Kush R. Varshney FaML 25 28 0 13 Dec 2022
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 72 439 0 08 Dec 2022
Addressing Distribution Shift at Test Time in Pre-trained Language Models Ayush Singh J. Ortega VLM 27 4 0 05 Dec 2022
SOLD: Sinhala Offensive Language Dataset Tharindu Ranasinghe Isuri Anuradha Damith Premasiri Kanishka Silva Hansi Hettiarachchi Lasitha Uyangodage Marcos Zampieri 41 8 0 01 Dec 2022
A Fair Loss Function for Network Pruning Robbie Meyer Alexander Wong CVBM 27 3 0 18 Nov 2022
Striving for data-model efficiency: Identifying data externalities on group performance Esther Rolf Ben Packer Alex Beutel Fernando Diaz TDI 30 2 0 11 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match Myles Bartlett Sara Romiti V. Sharmanska Novi Quadrianto 42 3 0 07 Nov 2022
Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection Jiyun Kim Byounghan Lee Kyung-ah Sohn 26 13 0 01 Nov 2022
Nearest Neighbor Language Models for Stylistic Controllable Generation Severino Trotta Lucie Flek Charles F Welch 31 4 0 27 Oct 2022
Sufficient Invariant Learning for Distribution Shift Taero Kim Sungjun Lim Kyungwoo Song OOD 31 2 0 24 Oct 2022
Detecting Unintended Social Bias in Toxic Language Datasets Nihar Ranjan Sahoo Himanshu Gupta P. Bhattacharyya 18 18 0 21 Oct 2022
Scaling Instruction-Finetuned Language Models Hyung Won Chung Le Hou Shayne Longpre Barret Zoph Yi Tay ... Jacob Devlin Adam Roberts Denny Zhou Quoc V. Le Jason W. Wei ReLM LRM 97 2,999 0 20 Oct 2022
On Feature Learning in the Presence of Spurious Correlations Pavel Izmailov Polina Kirichenko Nate Gruver A. Wilson 36 118 0 20 Oct 2022
How Hate Speech Varies by Target Identity: A Computational Analysis Michael Miller Yoder Lynnette Hui Xian Ng D. W. Brown Kathleen M. Carley 33 20 0 19 Oct 2022