Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1903.04561
Cited By
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
11 March 2019
Daniel Borkan
Lucas Dixon
Jeffrey Scott Sorensen
Nithum Thain
Lucy Vasserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification"
50 / 115 papers shown
Title
Enforcing Fairness Where It Matters: An Approach Based on Difference-of-Convex Constraints
Yutian He
Yankun Huang
Yao Yao
Qihang Lin
FaML
9
0
0
18 May 2025
Fine-Grained Bias Exploration and Mitigation for Group-Robust Classification
Miaoyun Zhao
Qiang Zhang
C. Li
31
0
0
11 May 2025
Teaching Models to Understand (but not Generate) High-risk Data
Ryan Yixiang Wang
Matthew Finlayson
Luca Soldaini
Swabha Swayamdipta
Robin Jia
154
0
0
05 May 2025
Validating LLM-as-a-Judge Systems in the Absence of Gold Labels
Luke M. Guerdan
Solon Barocas
Kenneth Holstein
Hanna M. Wallach
Zhiwei Steven Wu
Alexandra Chouldechova
ALM
ELM
257
0
0
13 Mar 2025
Out-of-Distribution Detection using Synthetic Data Generation
Momin Abbas
Muneeza Azmat
R. Horesh
Mikhail Yurochkin
47
1
0
05 Feb 2025
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification
Tom A. Lamb
Adam Davies
Alasdair Paren
Philip Torr
Francesco Pinto
52
0
0
30 Oct 2024
Compositional Risk Minimization
Divyat Mahajan
Mohammad Pezeshki
Ioannis Mitliagkas
Kartik Ahuja
Pascal Vincent
Pascal Vincent
26
3
0
08 Oct 2024
Identity-related Speech Suppression in Generative AI Content Moderation
Oghenefejiro Isaacs Anigboro
Charlie M. Crawford
Danaë Metaxa
Sorelle A. Friedler
Sorelle A. Friedler
26
0
0
09 Sep 2024
Towards Generalized Offensive Language Identification
A. Dmonte
Tejas Arya
Tharindu Ranasinghe
Marcos Zampieri
52
3
0
26 Jul 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs
S. Kadhe
Farhan Ahmed
Dennis Wei
Nathalie Baracaldo
Inkit Padhi
MoMe
MU
28
7
0
17 Jun 2024
Automated Program Repair: Emerging trends pose and expose problems for benchmarks
J. Renzullo
Pemma Reiter
Westley Weimer
Stephanie Forrest
42
1
0
08 May 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
Beyza Ermis
38
7
0
06 Mar 2024
Implicit Bias and Fast Convergence Rates for Self-attention
Bhavya Vasudeva
Puneesh Deora
Christos Thrampoulidis
37
13
0
08 Feb 2024
Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts
Xiruo Ding
Zhecheng Sheng
Brian Hur
Feng Chen
Serguei V. S. Pakhomov
Trevor Cohen
OOD
23
0
0
09 Dec 2023
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim
Thomas Möllenhoff
E. Ponti
Iryna Gurevych
Mohammad Emtiyaz Khan
MoMe
FedML
32
44
0
19 Oct 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Luiza Amador Pozzobon
Beyza Ermis
Patrick Lewis
Sara Hooker
36
20
0
11 Oct 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
43
66
0
21 Sep 2023
Bias Amplification Enhances Minority Group Performance
Gaotang Li
Jiarui Liu
Wei Hu
28
5
0
13 Sep 2023
Zero-Shot Robustification of Zero-Shot Models
Dyah Adila
Changho Shin
Lin Cai
Frederic Sala
43
19
0
08 Sep 2023
Thesis Distillation: Investigating The Impact of Bias in NLP Models on Hate Speech Detection
Fatma Elsafoury
29
3
0
31 Aug 2023
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede's Cultural Dimensions
Reem I. Masoud
Ziquan Liu
Martin Ferianc
Philip C. Treleaven
Miguel R. D. Rodrigues
27
50
0
25 Aug 2023
Separate the Wheat from the Chaff: Model Deficiency Unlearning via Parameter-Efficient Module Operation
Xinshuo Hu
Dongfang Li
Baotian Hu
Zihao Zheng
Zhenyu Liu
Hao Fei
KELM
MU
35
26
0
16 Aug 2023
LCT-1 at SemEval-2023 Task 10: Pre-training and Multi-task Learning for Sexism Detection and Classification
K. Chernyshev
E. Garanina
Duygu Bayram
Qiankun Zheng
Lukas Edman
13
0
0
08 Jun 2023
Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations
Lifan Yuan
Yangyi Chen
Ganqu Cui
Hongcheng Gao
Fangyuan Zou
Xingyi Cheng
Heng Ji
Zhiyuan Liu
Maosong Sun
39
73
0
07 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
38
8
0
31 May 2023
Rectifying Group Irregularities in Explanations for Distribution Shift
Adam Stein
Yinjun Wu
Eric Wong
Mayur Naik
37
1
0
25 May 2023
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis
Oscar Chew
Hsuan-Tien Lin
Kai-Wei Chang
Kuan-Hao Huang
38
5
0
23 May 2023
Modeling the Q-Diversity in a Min-max Play Game for Robust Optimization
Ting Wu
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
51
2
0
20 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
125
1,152
0
17 May 2023
Addressing Biases in the Texts using an End-to-End Pipeline Approach
Shaina Raza
Syed Raza Bashir
Sneha
Urooj Qamar
38
0
0
13 Mar 2023
Distributionally Robust Optimization with Probabilistic Group
Soumya Suvra Ghosal
Yixuan Li
OOD
13
7
0
10 Mar 2023
Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets
Irina Bejan
Artem Sokolov
Katja Filippova
TDI
32
9
0
27 Feb 2023
Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection
Soumyajit Gupta
Sooyong Lee
Maria De-Arteaga
Matthew Lease
27
13
0
14 Feb 2023
Towards Agile Text Classifiers for Everyone
Maximilian Mozes
Jessica Hoffmann
Katrin Tomanek
Muhamed Kouate
Nithum Thain
Ann Yuan
Tolga Bolukbasi
Lucas Dixon
52
13
0
13 Feb 2023
A benchmark for toxic comment classification on Civil Comments dataset
Corentin Duchene
Henri Jamet
Pierre Guillaume
Reda Dehak
35
8
0
26 Jan 2023
ViHOS: Hate Speech Spans Detection for Vietnamese
Phu Gia Hoang
Canh Duc Luu
K. Tran
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
31
20
0
24 Jan 2023
Fair Infinitesimal Jackknife: Mitigating the Influence of Biased Training Data Points Without Refitting
P. Sattigeri
S. Ghosh
Inkit Padhi
Pierre L. Dognin
Kush R. Varshney
FaML
25
28
0
13 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
72
439
0
08 Dec 2022
Addressing Distribution Shift at Test Time in Pre-trained Language Models
Ayush Singh
J. Ortega
VLM
27
4
0
05 Dec 2022
SOLD: Sinhala Offensive Language Dataset
Tharindu Ranasinghe
Isuri Anuradha
Damith Premasiri
Kanishka Silva
Hansi Hettiarachchi
Lasitha Uyangodage
Marcos Zampieri
41
8
0
01 Dec 2022
A Fair Loss Function for Network Pruning
Robbie Meyer
Alexander Wong
CVBM
27
3
0
18 Nov 2022
Striving for data-model efficiency: Identifying data externalities on group performance
Esther Rolf
Ben Packer
Alex Beutel
Fernando Diaz
TDI
30
2
0
11 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
42
3
0
07 Nov 2022
Why Is It Hate Speech? Masked Rationale Prediction for Explainable Hate Speech Detection
Jiyun Kim
Byounghan Lee
Kyung-ah Sohn
26
13
0
01 Nov 2022
Nearest Neighbor Language Models for Stylistic Controllable Generation
Severino Trotta
Lucie Flek
Charles F Welch
31
4
0
27 Oct 2022
Sufficient Invariant Learning for Distribution Shift
Taero Kim
Sungjun Lim
Kyungwoo Song
OOD
31
2
0
24 Oct 2022
Detecting Unintended Social Bias in Toxic Language Datasets
Nihar Ranjan Sahoo
Himanshu Gupta
P. Bhattacharyya
18
18
0
21 Oct 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
97
2,999
0
20 Oct 2022
On Feature Learning in the Presence of Spurious Correlations
Pavel Izmailov
Polina Kirichenko
Nate Gruver
A. Wilson
36
118
0
20 Oct 2022
How Hate Speech Varies by Target Identity: A Computational Analysis
Michael Miller Yoder
Lynnette Hui Xian Ng
D. W. Brown
Kathleen M. Carley
33
20
0
19 Oct 2022
1
2
3
Next