What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations

30 November 2023

Papers citing "What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations"

6 / 6 papers shown

Title
Representation Engineering for Large-Language Models: Survey and Research Challenges Lukasz Bartoszcze Sarthak Munshi Bryan Sukidi Jennifer Yen Zejia Yang David Williams-King Linh Le Kosi Asuzu Carsten Maple 102 0 0 24 Feb 2025
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation Xiangjue Dong Yibo Wang Philip S. Yu James Caverlee 32 26 0 01 Nov 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 339 12,003 0 04 Mar 2022
Assessing the Reliability of Word Embedding Gender Bias Measures Yupei Du Qixiang Fang D. Nguyen 46 21 0 10 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 279 1,996 0 31 Dec 2020