The Woman Worked as a Babysitter: On Biases in Language Generation

3 September 2019

Papers citing "The Woman Worked as a Babysitter: On Biases in Language Generation"

50 / 102 papers shown

Title
BiasGuard: A Reasoning-enhanced Bias Detection Tool For Large Language Models Zhiting Fan Ruizhe Chen Zuozhu Liu 44 0 0 30 Apr 2025
$$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation$ $\texttt{SAGE}$ : A Generic Framework for LLM Safety Evaluation Madhur Jindal Hari Shrawgi Parag Agrawal Sandipan Dandapat ELM 47 0 0 28 Apr 2025
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns Michael A. Hedderich Anyi Wang Raoyuan Zhao Florian Eichin Barbara Plank 30 0 0 22 Apr 2025
Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification Takuma Udagawa Yang Zhao H. Kanayama Bishwaranjan Bhattacharjee 26 0 0 19 Apr 2025
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages Alessio Buscemi Cedric Lothritz Sergio Morales Marcos Gomez-Vazquez Robert Clarisó Jordi Cabot German Castignani 26 0 0 19 Apr 2025
Representation-based Reward Modeling for Efficient Safety Alignment of Large Language Model Qiyuan Deng X. Bai Kehai Chen Yaowei Wang Liqiang Nie Min Zhang OffRL 59 0 0 13 Mar 2025
Who Relies More on World Knowledge and Bias for Syntactic Ambiguity Resolution: Humans or LLMs? So Young Lee Russell Scheinberg Amber Shore Ameeta Agrawal 46 1 0 13 Mar 2025
Fine-Tuned LLMs are "Time Capsules" for Tracking Societal Bias Through Books Sangmitra Madhusudan Robert D Morabito Skye Reid Nikta Gohari Sadr Ali Emami 56 0 0 07 Feb 2025
Fairness through Difference Awareness: Measuring Desired Group Discrimination in LLMs Angelina Wang Michelle Phan Daniel E. Ho Sanmi Koyejo 49 2 0 04 Feb 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models Abdulkadir Erol Trilok Padhi Agnik Saha Ugur Kursuncu Mehmet Emin Aktas 45 1 0 17 Jan 2025
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges Farid Ariai Gianluca Demartini ELM AILaw VLM 40 4 0 25 Oct 2024
Natural Language Processing for Human Resources: A Survey Naoki Otani Nikita Bhutani Estevam R. Hruschka VLM 37 0 0 21 Oct 2024
AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances Dhruv Agarwal Mor Naaman Aditya Vashistha 36 16 0 17 Sep 2024
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models Kunsheng Tang Wenbo Zhou Jie Zhang Aishan Liu Gelei Deng Shuai Li Peigui Qi Weiming Zhang Tianwei Zhang Nenghai Yu 39 3 0 22 Aug 2024
Tamper-Resistant Safeguards for Open-Weight LLMs Rishub Tamirisa Bhrugu Bharathi Long Phan Andy Zhou Alice Gatti ... Andy Zou Dawn Song Bo Li Dan Hendrycks Mantas Mazeika AAML MU 51 38 0 01 Aug 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation Riccardo Cantini Giada Cosenza A. Orsino Domenico Talia AAML 50 5 0 11 Jul 2024
Composable Interventions for Language Models Arinbjorn Kolbeinsson Kyle O'Brien Tianjin Huang Shanghua Gao Shiwei Liu ... Anurag J. Vaidya Faisal Mahmood Marinka Zitnik Tianlong Chen Thomas Hartvigsen KELM MU 87 5 0 09 Jul 2024
Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression Zhichao Xu Ashim Gupta Tao Li Oliver Bentham Vivek Srikumar 40 8 0 06 Jul 2024
FrenchToxicityPrompts: a Large Benchmark for Evaluating and Mitigating Toxicity in French Texts Caroline Brun Vassilina Nikoulina 36 1 0 25 Jun 2024
Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs S. Kadhe Farhan Ahmed Dennis Wei Nathalie Baracaldo Inkit Padhi MoMe MU 28 6 0 17 Jun 2024
Exploring Safety-Utility Trade-Offs in Personalized Language Models Anvesh Rao Vijjini Somnath Basu Roy Chowdhury Snigdha Chaturvedi 45 6 0 17 Jun 2024
Ask LLMs Directly, "What shapes your bias?": Measuring Social Bias in Large Language Models Jisu Shin Hoyun Song Huije Lee Soyeong Jeong Jong C. Park 38 6 0 06 Jun 2024
Exploring Subjectivity for more Human-Centric Assessment of Social Biases in Large Language Models Paula Akemi Aoyagui Sharon Ferguson Anastasia Kuzminykh 50 0 0 17 May 2024
Quite Good, but Not Enough: Nationality Bias in Large Language Models -- A Case Study of ChatGPT Shucheng Zhu Weikang Wang Ying Liu 29 5 0 11 May 2024
Hire Me or Not? Examining Language Model's Behavior with Occupation Attributes Damin Zhang Yi Zhang Geetanjali Bihani Julia Taylor Rayz 50 2 0 06 May 2024
Laissez-Faire Harms: Algorithmic Biases in Generative Language Models Evan Shieh Faye-Marie Vassel Cassidy R. Sugimoto T. Monroe-White 37 3 0 11 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety Paul Röttger Fabio Pernisi Bertie Vidgen Dirk Hovy ELM KELM 58 30 0 08 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models Farnaz Kohankhaki D. B. Emerson David B. Emerson Laleh Seyyed-Kalantari Faiza Khan Khattak 52 1 0 04 Apr 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models Xinpeng Wang Shitong Duan Xiaoyuan Yi Jing Yao Shanlin Zhou Zhihua Wei Peng Zhang Dongkuan Xu Maosong Sun Xing Xie OffRL 38 16 0 07 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models Luiza Amador Pozzobon Patrick Lewis Sara Hooker B. Ermiş 38 7 0 06 Mar 2024
Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution Flor Miriam Plaza del Arco A. C. Curry Alba Curry Gavin Abercrombie Dirk Hovy 31 23 0 05 Mar 2024
What's in a Name? Auditing Large Language Models for Race and Gender Bias Amit Haim Alejandro Salinas Julian Nyarko 50 32 0 21 Feb 2024
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 33 6 0 21 Nov 2023
JAB: Joint Adversarial Prompting and Belief Augmentation Ninareh Mehrabi Palash Goyal Anil Ramakrishna Jwala Dhamala Shalini Ghosh Richard Zemel Kai-Wei Chang Aram Galstyan Rahul Gupta AAML 28 7 0 16 Nov 2023
Emerging Challenges in Personalized Medicine: Assessing Demographic Effects on Biomedical Question Answering Systems Sagi Shaier Kevin Bennett Lawrence E Hunter K. Wense 29 0 0 16 Oct 2023
Learning to Rank Context for Named Entity Recognition Using a Synthetic Dataset Arthur Amalvy Vincent Labatut Richard Dufour 28 9 0 16 Oct 2023
"Kelly is a Warm Person, Joseph is a Role Model": Gender Biases in LLM-Generated Reference Letters Yixin Wan George Pu Jiao Sun Aparna Garimella Kai-Wei Chang Nanyun Peng 34 159 0 13 Oct 2023
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models Luiza Amador Pozzobon B. Ermiş Patrick Lewis Sara Hooker 28 20 0 11 Oct 2023
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model Nolan Dey Daria Soboleva Faisal Al-Khateeb Bowen Yang Ribhu Pathria ... Robert Myers Jacob Robert Steeves Natalia Vassilieva Marvin Tom Joel Hestness MoE 19 14 0 20 Sep 2023
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs Patrick Haller Ansar Aynetdinov A. Akbik 28 24 0 07 Sep 2023
A Survey on Fairness in Large Language Models Yingji Li Mengnan Du Rui Song Xin Wang Ying Wang ALM 49 59 0 20 Aug 2023
CMD: a framework for Context-aware Model self-Detoxification Zecheng Tang Keyan Zhou Juntao Li Yuyang Ding Pinzheng Wang Bowen Yan Minzhang MU 23 5 0 16 Aug 2023
The Unequal Opportunities of Large Language Models: Revealing Demographic Bias through Job Recommendations A. Salinas Parth Vipul Shah Yuzhong Huang Robert McCormack Fred Morstatter 26 33 0 03 Aug 2023
Opinion Mining Using Population-tuned Generative Language Models Allmin Pradhap Singh Susaiyah Abhinay Pandya Aki Härmä 13 0 0 24 Jul 2023
An Empirical Analysis of Parameter-Efficient Methods for Debiasing Pre-Trained Language Models Zhongbin Xie Thomas Lukasiewicz 23 12 0 06 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation Rahul Madhavan Rishabh Garg Kahini Wadhawan S. Mehta 21 5 0 01 Jun 2023
Uncovering and Categorizing Social Biases in Text-to-SQL Y. Liu Yan Gao Zhe Su Xiaokang Chen Elliott Ash Jian-Guang Lou 58 6 0 25 May 2023
Uncovering and Quantifying Social Biases in Code Generation Y. Liu Xiaokang Chen Yan Gao Zhe Su Fengji Zhang Daoguang Zan Jian-Guang Lou Pin-Yu Chen Tsung-Yi Ho 36 19 0 24 May 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models Tarek Naous Michael Joseph Ryan Alan Ritter Wei-ping Xu 26 85 0 23 May 2023
A Trip Towards Fairness: Bias and De-Biasing in Large Language Models Leonardo Ranaldi Elena Sofia Ruzzetti Davide Venditti Dario Onorati Fabio Massimo Zanzotto 27 34 0 23 May 2023