Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.18812
Cited By
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
30 November 2023
Raphael Tang
Xinyu Crystina Zhang
Jimmy J. Lin
Ferhan Ture
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations"
6 / 6 papers shown
Title
Representation Engineering for Large-Language Models: Survey and Research Challenges
Lukasz Bartoszcze
Sarthak Munshi
Bryan Sukidi
Jennifer Yen
Zejia Yang
David Williams-King
Linh Le
Kosi Asuzu
Carsten Maple
102
0
0
24 Feb 2025
Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation
Xiangjue Dong
Yibo Wang
Philip S. Yu
James Caverlee
32
26
0
01 Nov 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Assessing the Reliability of Word Embedding Gender Bias Measures
Yupei Du
Qixiang Fang
D. Nguyen
46
21
0
10 Sep 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
1