ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05070
  4. Cited By
A Roadmap to Pluralistic Alignment

A Roadmap to Pluralistic Alignment

7 February 2024
Taylor Sorensen
Jared Moore
Jillian R. Fisher
Mitchell L. Gordon
Niloofar Mireshghallah
Christopher Rytting
Andre Ye
Liwei Jiang
Ximing Lu
Nouha Dziri
Tim Althoff
Yejin Choi
ArXivPDFHTML

Papers citing "A Roadmap to Pluralistic Alignment"

34 / 34 papers shown
Title
AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing
AI-Augmented LLMs Achieve Therapist-Level Responses in Motivational Interviewing
Yinghui Huang
Yuxuan Jiang
Hui Liu
Yixin Cai
Weiqing Li
Xiangen Hu
AI4MH
204
0
0
23 May 2025
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Societal Impacts Research Requires Benchmarks for Creative Composition Tasks
Judy Hanwen Shen
Carlos Guestrin
162
1
0
09 Apr 2025
Is Free Self-Alignment Possible?
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
141
2
0
24 Feb 2025
The Call for Socially Aware Language Technologies
The Call for Socially Aware Language Technologies
Diyi Yang
Dirk Hovy
David Jurgens
Barbara Plank
VLM
125
11
0
24 Feb 2025
Evaluating the Prompt Steerability of Large Language Models
Evaluating the Prompt Steerability of Large Language Models
Erik Miehling
Michael Desmond
Karthikeyan N. Ramamurthy
Elizabeth M. Daly
Pierre Dognin
Jesus Rios
Djallel Bouneffouf
Miao Liu
LLMSV
130
5
0
19 Nov 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
82
4
0
02 Oct 2024
Open-World Evaluation for Retrieving Diverse Perspectives
Open-World Evaluation for Retrieving Diverse Perspectives
Hung-Ting Chen
Eunsol Choi
81
0
0
26 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
144
23
0
06 Sep 2024
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment
Thom Lake
Eunsol Choi
Greg Durrett
87
11
0
25 Jun 2024
Aligning Large Language Models with Human Preferences through
  Representation Engineering
Aligning Large Language Models with Human Preferences through Representation Engineering
Tianlong Li
Xiaohua Wang
Muling Wu
Changze Lv
Changze Lv
Zixuan Ling
Jianhao Zhu
Cenyuan Zhang
Xiaoqing Zheng
Xuanjing Huang
52
39
0
26 Dec 2023
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
79
67
0
13 Dec 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
150
147
0
10 Oct 2023
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Cheng Li
Mingyang Zhang
Qiaozhu Mei
Yaqing Wang
Spurthi Amba Hombaiah
Yi Liang
Michael Bendersky
AI4Ed
77
39
0
15 Aug 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
53
149
0
07 Jun 2023
Aligning Language Models to User Opinions
Aligning Language Models to User Opinions
EunJeong Hwang
Bodhisattwa Prasad Majumder
Niket Tandon
70
69
0
24 May 2023
When the Majority is Wrong: Modeling Annotator Disagreement for
  Subjective Tasks
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks
Eve Fleisig
Rediet Abebe
Dan Klein
53
49
0
11 May 2023
Cognitive Reframing of Negative Thoughts through Human-Language Model
  Interaction
Cognitive Reframing of Negative Thoughts through Human-Language Model Interaction
Ashish Sharma
Kevin Rushton
Inna Wanyin Lin
David Wadden
Khendra G. Lucas
Adam S. Miner
Theresa Nguyen
Tim Althoff
95
74
0
04 May 2023
Can Large Language Models Transform Computational Social Science?
Can Large Language Models Transform Computational Social Science?
Caleb Ziems
William B. Held
Omar Shaikh
Jiaao Chen
Zhehao Zhang
Diyi Yang
LLMAG
56
308
0
12 Apr 2023
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
171
1,611
0
15 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
100
230
0
28 Nov 2022
Measuring Progress on Scalable Oversight for Large Language Models
Measuring Progress on Scalable Oversight for Large Language Models
Sam Bowman
Jeeyoon Hyun
Ethan Perez
Edwin Chen
Craig Pettit
...
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Jared Kaplan
ALM
ELM
62
128
0
04 Nov 2022
Moral Mimicry: Large Language Models Produce Moral Rationalizations
  Tailored to Political Identity
Moral Mimicry: Large Language Models Produce Moral Rationalizations Tailored to Political Identity
Gabriel Simmons
131
63
0
24 Sep 2022
Out of One, Many: Using Language Models to Simulate Human Samples
Out of One, Many: Using Language Models to Simulate Human Samples
Lisa P. Argyle
Ethan C. Busby
Nancy Fulda
Joshua R Gubler
Christopher Rytting
David Wingate
SyDa
79
589
0
14 Sep 2022
Using Large Language Models to Simulate Multiple Humans and Replicate
  Human Subject Studies
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies
Gati Aher
RosaI. Arriaga
Adam Tauman Kalai
100
390
0
18 Aug 2022
COLD Decoding: Energy-based Constrained Text Generation with Langevin
  Dynamics
COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
Lianhui Qin
Sean Welleck
Daniel Khashabi
Yejin Choi
AI4CE
88
149
0
23 Feb 2022
A General Language Assistant as a Laboratory for Alignment
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
116
775
0
01 Dec 2021
Process for Adapting Language Models to Society (PALMS) with
  Values-Targeted Datasets
Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets
Irene Solaiman
Christy Dennison
85
224
0
18 Jun 2021
Early-stopped neural networks are consistent
Early-stopped neural networks are consistent
Ziwei Ji
Justin D. Li
Matus Telgarsky
54
37
0
10 Jun 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and
  Anti-Experts
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
98
371
0
07 May 2021
NeuroLogic Decoding: (Un)supervised Neural Text Generation with
  Predicate Logic Constraints
NeuroLogic Decoding: (Un)supervised Neural Text Generation with Predicate Logic Constraints
Ximing Lu
Peter West
Rowan Zellers
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
NAI
74
146
0
24 Oct 2020
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Utility is in the Eye of the User: A Critique of NLP Leaderboards
Kawin Ethayarajh
Dan Jurafsky
ELM
70
52
0
29 Sep 2020
Aligning AI With Shared Human Values
Aligning AI With Shared Human Values
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
139
548
0
05 Aug 2020
A Generalized Algorithm for Multi-Objective Reinforcement Learning and
  Policy Adaptation
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation
Runzhe Yang
Xingyuan Sun
Karthik Narasimhan
74
254
0
21 Aug 2019
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
86
413
0
19 Nov 2018
1