ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16480
  4. Cited By
Human Preferences for Constructive Interactions in Language Model Alignment

Human Preferences for Constructive Interactions in Language Model Alignment

5 March 2025
Yara Kyrychenko
Jon Roozenbeek
Brandon Davidson
S. V. D. Linden
Ramit Debnath
ArXiv (abs)PDFHTML

Papers citing "Human Preferences for Constructive Interactions in Language Model Alignment"

10 / 10 papers shown
Title
C3AI: Crafting and Evaluating Constitutions for Constitutional AI
Yara Kyrychenko
Ke Zhou
Edyta Bogucka
Daniele Quercia
ELM
77
5
0
21 Feb 2025
Re-Ranking News Comments by Constructiveness and Curiosity Significantly
  Increases Perceived Respect, Trustworthiness, and Interest
Re-Ranking News Comments by Constructiveness and Curiosity Significantly Increases Perceived Respect, Trustworthiness, and Interest
Emily Saltz
Zaria Jalan
Tin Acosta
56
2
0
08 Apr 2024
Dissecting Human and LLM Preferences
Dissecting Human and LLM Preferences
Junlong Li
Fan Zhou
Shichao Sun
Yikai Zhang
Hai Zhao
Pengfei Liu
ALM
68
6
0
17 Feb 2024
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
313
241
0
20 Oct 2023
Intersectionality in Conversational AI Safety: How Bayesian Multilevel
  Models Help Understand Diverse Perceptions of Safety
Intersectionality in Conversational AI Safety: How Bayesian Multilevel Models Help Understand Diverse Perceptions of Safety
Christopher Homan
Greg Serapio-García
Lora Aroyo
Mark Díaz
Alicia Parrish
Vinodkumar Prabhakaran
Alex S. Taylor
Ding Wang
66
9
0
20 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
387
4,125
0
29 May 2023
A General Language Assistant as a Laboratory for Alignment
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
789
0
01 Dec 2021
Conversations Gone Alright: Quantifying and Predicting Prosocial
  Outcomes in Online Conversations
Conversations Gone Alright: Quantifying and Predicting Prosocial Outcomes in Online Conversations
Jiajun Bao
J. Wu
Yiming Zhang
Eshwar Chandrasekharan
David Jurgens
96
49
0
16 Feb 2021
Persistent Anti-Muslim Bias in Large Language Models
Persistent Anti-Muslim Bias in Large Language Models
Abubakar Abid
Maheen Farooqi
James Zou
AILaw
108
555
0
14 Jan 2021
Identifying and Reducing Gender Bias in Word-Level Language Models
Identifying and Reducing Gender Bias in Word-Level Language Models
Shikha Bordia
Samuel R. Bowman
FaML
118
327
0
05 Apr 2019
1