Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

v1v2 (latest)

Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings

25 November 2024

Carolin M. Schuster

Maria-Alexandra Dinisor

Shashwat Ghatiwala

ArXiv (abs)PDF HTML

Papers citing "Profiling Bias in LLMs: Stereotype Dimensions in Contextual Word Embeddings"

17 / 17 papers shown

Title
OLMo: Accelerating the Science of Language Models Dirk Groeneveld Iz Beltagy Pete Walsh Akshita Bhagia Rodney Michael Kinney ... Jesse Dodge Kyle Lo Luca Soldaini Noah A. Smith Hanna Hajishirzi OSLM 189 409 0 01 Feb 2024
Bias and Fairness in Large Language Models: A Survey Isabel O. Gallegos Ryan Rossi Joe Barrow Md Mehrab Tanjim Sungchul Kim Franck Dernoncourt Tong Yu Ruiyi Zhang Nesreen Ahmed AILaw 99 593 0 02 Sep 2023
Evaluating Biased Attitude Associations of Language Models in an Intersectional Context Shiva Omrani Sabbaghi Robert Wolfe Aylin Caliskan 56 23 0 07 Jul 2023
SensePOLAR: Word sense aware interpretability for pre-trained contextual word embeddings Jan Engler Sandipan Sikdar Marlene Lutz M. Strohmaier 63 7 0 11 Jan 2023
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model BigScience Workshop : Teven Le Scao Angela Fan Christopher Akiki ... Zhongli Xie Zifan Ye M. Bras Younes Belkada Thomas Wolf VLM 401 2,394 0 09 Nov 2022
A Robust Bias Mitigation Procedure Based on the Stereotype Content Model Eddie L. Ungless Amy Rafferty Hrichika Nag Bjorn Ross 54 30 0 26 Oct 2022
Understanding and Countering Stereotypes: A Computational Approach to the Stereotype Content Model Kathleen C. Fraser I. Nejadgholi S. Kiritchenko 56 40 0 04 Jun 2021
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases W. Guo Aylin Caliskan 39 243 0 06 Jun 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP Su Lin Blodgett Solon Barocas Hal Daumé Hanna M. Wallach 157 1,248 0 28 May 2020
FrameAxis: Characterizing Microframe Bias and Intensity with Word Embedding Haewoon Kwak Jisun An Elise Jing Yong-Yeol Ahn 36 44 0 20 Feb 2020
The POLAR Framework: Polar Opposites Enable Interpretability of Pre-Trained Word Embeddings Binny Mathew Sandipan Sikdar Florian Lemmerich M. Strohmaier 47 36 0 27 Jan 2020
Assessing Social and Intersectional Biases in Contextualized Word Representations Y. Tan Elisa Celis FaML 97 228 0 04 Nov 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma Radu Soricut SSL AIMat 373 6,467 0 26 Sep 2019
On Measuring Social Biases in Sentence Encoders Chandler May Alex Jinpeng Wang Shikha Bordia Samuel R. Bowman Rachel Rudinger 104 603 0 25 Mar 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,175 0 11 Oct 2018
SemAxis: A Lightweight Framework to Characterize Domain-Specific Word Semantics Beyond Sentiment Jisun An Haewoon Kwak Yong-Yeol Ahn 53 64 0 14 Jun 2018
Semantics derived automatically from language corpora contain human-like biases Aylin Caliskan J. Bryson Arvind Narayanan 215 2,673 0 25 Aug 2016