Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.11861
Cited By
Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity
17 May 2025
Qi Zhou
Jie Zhang
Dongxia Wang
Qiang Liu
Tianlin Li
Jin Song Dong
Wenhai Wang
Qing Guo
SyDa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fair-PP: A Synthetic Dataset for Aligning LLM with Personalized Preferences of Social Equity"
19 / 19 papers shown
Title
SEA-LION: Southeast Asian Languages in One Network
Raymond Ng
Thanh Ngan Nguyen
Yuli Huang
Ngee Chia Tai
Wai Yi Leong
...
David Ong Tat-Wee
B. Liu
William-Chandra Tjhi
Min Zhang
Leslie Teo
105
14
0
08 Apr 2025
SafeWorld: Geo-Diverse Safety Alignment
Da Yin
Haoyi Qiu
Kung-Hsiang Huang
Kai-Wei Chang
Nanyun Peng
93
8
0
09 Dec 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
97
166
0
18 Jun 2024
Safety Alignment Should Be Made More Than Just a Few Tokens Deep
Xiangyu Qi
Ashwinee Panda
Kaifeng Lyu
Xiao Ma
Subhrajit Roy
Ahmad Beirami
Prateek Mittal
Peter Henderson
95
126
0
10 Jun 2024
Group Robust Preference Optimization in Reward-free RLHF
Shyam Sundhar Ramesh
Yifan Hu
Iason Chaimalas
Viraj Mehta
Pier Giuseppe Sessa
Haitham Bou-Ammar
Ilija Bogunovic
65
36
0
30 May 2024
WorldValuesBench: A Large-Scale Benchmark Dataset for Multi-Cultural Value Awareness of Language Models
Wenlong Zhao
Debanjan Mondal
Niket Tandon
Danica Dillion
Kurt Gray
Yuling Gu
VLM
97
15
0
25 Apr 2024
Investigating Cultural Alignment of Large Language Models
Badr AlKhamissi
Muhammad N. ElNokrashy
Mai AlKhamissi
Mona T. Diab
110
57
0
20 Feb 2024
CultureLLM: Incorporating Cultural Differences into Large Language Models
Cheng-rong Li
Mengzhou Chen
Jindong Wang
Sunayana Sitaram
Xing Xie
VLM
90
19
0
09 Feb 2024
Cultural Bias and Cultural Alignment of Large Language Models
Yan Tao
Olga Viberg
Ryan S. Baker
René F. Kizilcec
ELM
93
86
0
23 Nov 2023
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen
Liwei Jiang
Jena D. Hwang
Sydney Levine
Valentina Pyatkin
...
Kavel Rao
Chandra Bhagavatula
Maarten Sap
J. Tasioulas
Yejin Choi
SLR
86
57
0
02 Sep 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
91
481
0
10 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
71
238
0
28 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
385
3,981
0
29 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
128
595
0
22 May 2023
Whose Opinions Do Language Models Reflect?
Shibani Santurkar
Esin Durmus
Faisal Ladhak
Cinoo Lee
Percy Liang
Tatsunori Hashimoto
76
432
0
30 Mar 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,359
0
15 Mar 2023
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
249
2,561
0
12 Apr 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
466
1,734
0
18 Sep 2019
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
169
3,302
0
12 Jun 2017
1