Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks

International Conference on Computational Linguistics (COLING), 2022

6 October 2022

Papers citing "Debiasing isn't enough! -- On the Effectiveness of Debiasing MLMs and their Social Biases in Downstream Tasks"

43 / 43 papers shown

Title
Online Learning Defense against Iterative Jailbreak Attacks via Prompt Optimization Masahiro Kaneko Zeerak Talat Timothy Baldwin AAML 61 1 0 19 Oct 2025
Once Is Enough: Lightweight DiT-Based Video Virtual Try-On via One-Time Garment Appearance Injection Yanjie Pan Qingdong He Lidong Wang Bo Peng Mingmin Chi DiffM VGen 43 0 0 09 Oct 2025
Bias after Prompting: Persistent Discrimination in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2025 N. Sivakumar Natalie Mackraz Samira Khorshidi Krishna Patel B. Theobald Luca Zappella N. Apostoloff AI4CE 48 1 0 09 Sep 2025
Do Biased Models Have Biased Thoughts? Swati Rajwal Shivank Garg Reem Abdel-Salam Abdelrahman Zayed LRM 120 0 0 08 Aug 2025
Exploring Gender Bias in Large Language Models: An In-depth Dive into the German Language Kristin Gnadt David Thulke Simone Kopeinik Ralf Schluter 121 0 0 22 Jul 2025
Safety Alignment via Constrained Knowledge Unlearning Zesheng Shi Yucheng Zhou Jing Li MU KELM AAML 172 4 0 24 May 2025
Evaluating the Effect of Retrieval Augmentation on Social Biases Tianhui Zhang Yi Zhou Danushka Bollegala 185 1 0 24 Feb 2025
Smaller Large Language Models Can Do Moral Self-Correction Guangliang Liu Zhiyu Xue Rongrong Wang K. Johnson Kristen Marie Johnson LRM 253 2 0 30 Oct 2024
BiasAlert: A Plug-and-play Tool for Social Bias Detection in LLMs Zhiting Fan Ruizhe Chen Ruiling Xu Zuozhu Liu KELM 202 29 0 14 Jul 2024
Social Bias Evaluation for Large Language Models Requires Prompt Variations Rem Hida Masahiro Kaneko Naoaki Okazaki 183 27 0 03 Jul 2024
A Study of Nationality Bias in Names and Perplexity using Off-the-Shelf Affect-related Tweet Classifiers Valentin Barriere Sebastian Cifuentes 140 3 0 01 Jul 2024
Why Don't Prompt-Based Fairness Metrics Correlate?Annual Meeting of the Association for Computational Linguistics (ACL), 2024 A. Zayed Gonçalo Mordido Ioana Baldini Sarath Chandar ALM 229 7 0 09 Jun 2024
Towards Understanding Task-agnostic Debiasing Through the Lenses of Intrinsic Bias and ForgetfulnessAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 Guangliang Liu Milad Afshari Xitong Zhang Zhiyu Xue Avrajit Ghosh Bidhan Bashyal Rongrong Wang K. Johnson 125 2 0 06 Jun 2024
Anna Karenina Strikes Again: Pre-Trained LLM Embeddings May Favor High-Performing Learners Abigail Gurin Schleifer Beata Beigman Klebanov Moriah Ariely Giora Alexandron 131 5 0 06 Jun 2024
On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept Guangliang Liu Haitao Mao Bochuan Cao Zhiyu Xue K. Johnson Shucheng Zhou Rongrong Wang LRM 160 16 0 04 Jun 2024
Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models Paula Akemi Aoyagui Sharon Ferguson Anastasia Kuzminykh 184 1 0 17 May 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps Giuseppe Attanasio Beatrice Savoldi Dennis Fucci Dirk Hovy 153 12 0 28 Feb 2024
Eagle: Ethical Dataset Given from Real Interactions Masahiro Kaneko Danushka Bollegala Timothy Baldwin 151 4 0 22 Feb 2024
Bias in Language Models: Beyond Trick Tests and Toward RUTEd Evaluation Kristian Lum Jacy Reese Anthis Chirag Nagpal Alex DÁmour Alexander D’Amour 347 28 0 20 Feb 2024
A Note on Bias to Complete Jia Xu Mona Diab 210 2 0 18 Feb 2024
Semantic Properties of cosine based bias scores for word embeddingsInternational Conference on Pattern Recognition Applications and Methods (ICPRAM), 2024 Sarah Schröder Alexander Schulz Fabian Hinder Barbara Hammer 178 1 0 27 Jan 2024
The Gaps between Pre-train and Downstream Settings in Bias Evaluation and Debiasing Masahiro Kaneko Danushka Bollegala Timothy Baldwin 172 5 0 16 Jan 2024
Understanding the Effect of Model Compression on Social Bias in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Gustavo Gonçalves Emma Strubell 247 17 0 09 Dec 2023
General Phrase Debiaser: Debiasing Masked Language Models at a Multi-Token LevelIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023 Bingkang Shi Xiaodan Zhang Dehan Kong Yulei Wu Zongzhen Liu Honglei Lyu Longtao Huang AI4CE 230 3 0 23 Nov 2023
Fair Text Classification with Wasserstein IndependenceConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Thibaud Leteno Antoine Gourru Charlotte Laclau Rémi Emonet Christophe Gravier FaML 175 5 0 21 Nov 2023
Does Pre-trained Language Model Actually Infer Unseen Links in Knowledge Graph Completion?North American Chapter of the Association for Computational Linguistics (NAACL), 2023 Yusuke Sakai Hidetaka Kamigaito Katsuhiko Hayashi Taro Watanabe 187 2 0 15 Nov 2023
Selecting Shots for Demographic Fairness in Few-Shot Learning with Large Language Models Carlos Alejandro Aguirre Kuleen Sasse Isabel Cachola Mark Dredze 245 2 0 14 Nov 2023
Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Laura Cabello Emanuele Bugliarello Stephanie Brandl Desmond Elliott 159 8 0 26 Oct 2023
A Predictive Factor Analysis of Social Biases and Task-Performance in Pretrained Masked Language Models Yi Zhou Jose Camacho-Collados Danushka Bollegala 345 6 0 19 Oct 2023
Co $^2$ PT: Mitigating Bias in Pre-trained Language Models through Counterfactual Contrastive Prompt TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 Xiangjue Dong Ziwei Zhu Zhuoer Wang Maria Teleki James Caverlee 212 15 0 19 Oct 2023
Evaluating Gender Bias of Pre-trained Language Models in Natural Language Inference by Considering All LabelsInternational Conference on Language Resources and Evaluation (LREC), 2023 Panatchakorn Anantaprayoon Masahiro Kaneko Naoaki Okazaki 304 21 0 18 Sep 2023
In-Contextual Gender Bias Suppression for Large Language ModelsFindings (Findings), 2023 Daisuke Oba Masahiro Kaneko Danushka Bollegala 186 12 0 13 Sep 2023
Bias and Fairness in Large Language Models: A SurveyComputational Linguistics (CL), 2023 Isabel O. Gallegos Ryan Rossi Joe Barrow Md Mehrab Tanjim Sungchul Kim Franck Dernoncourt Tong Yu Ruiyi Zhang Nesreen Ahmed AILaw 278 849 0 02 Sep 2023
Thesis Distillation: Investigating The Impact of Bias in NLP Models on Hate Speech Detection Fatma Elsafoury 157 4 0 31 Aug 2023
On Bias and Fairness in NLP: Investigating the Impact of Bias and Debiasing in Language Models on the Fairness of Toxicity Detection Fatma Elsafoury Stamos Katsigiannis 153 1 0 22 May 2023
Word Embeddings Are Steers for Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023 Chi Han Jialiang Xu Pengfei Yu Yi R. Fung Chenkai Sun Nan Jiang Tarek Abdelzaher Heng Ji LLMSV 241 61 0 22 May 2023
Solving NLP Problems through Human-System Collaboration: A Discussion-based ApproachFindings (Findings), 2023 Masahiro Kaneko Graham Neubig Naoaki Okazaki 242 6 0 19 May 2023
On the Origins of Bias in NLP through the Lens of the Jim Code Fatma Elsafoury Gavin Abercrombie 174 5 0 16 May 2023
On the Independence of Association Bias and Empirical Fairness in Language ModelsConference on Fairness, Accountability and Transparency (FAccT), 2023 Laura Cabello Anna Katrine van Zee Anders Søgaard 131 35 0 20 Apr 2023
Comparing Intrinsic Gender Bias Evaluation Measures without using Human Annotated ExamplesConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023 Masahiro Kaneko Danushka Bollegala Naoaki Okazaki 122 12 0 28 Jan 2023
Dissociating language and thought in large language models Kyle Mahowald Anna A. Ivanova I. Blank Nancy Kanwisher J. Tenenbaum Evelina Fedorenko ELM ReLM 228 228 0 16 Jan 2023
Gender Biases Unexpectedly Fluctuate in the Pre-training Stage of Masked Language Models Kenan Tang Hanchun Jiang AI4CE 145 1 0 26 Nov 2022
MABEL: Attenuating Gender Bias using Textual Entailment DataConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 Jacqueline He Mengzhou Xia C. Fellbaum Danqi Chen 164 37 0 26 Oct 2022