Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.06135
Cited By
Rewarding Chatbots for Real-World Engagement with Millions of Users
10 March 2023
R. Irvine
D. Boubert
Vyas Raina
Adian Liusie
Ziyi Zhu
Vineet Mudupalli
Aliaksei Korshuk
Z. Liu
Fritz Cremer
Valentin Assassi
Christie-Carol Beauchamp
Xiaoding Lu
Thomas Rialan
W. Beauchamp
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rewarding Chatbots for Real-World Engagement with Millions of Users"
24 / 24 papers shown
Title
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
154
2
0
14 Apr 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
77
0
0
26 Mar 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Y. Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
62
4
0
26 Feb 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
220
0
03 Jan 2025
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
S. Gorti
Ilan Gofman
Zhaoyan Liu
Jiapeng Wu
Noël Vouitsis
Guangwei Yu
Jesse C. Cresswell
Rasa Hosseinzadeh
SyDa
58
6
0
16 Oct 2024
Enhancing AI Assisted Writing with One-Shot Implicit Negative Feedback
Benjamin Towle
Ke Zhou
26
0
0
14 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
58
3
0
06 Oct 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Ilya Gusev
LLMAG
58
3
0
10 Sep 2024
A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case
Sonia Meyer
Shreya Singh
Bertha Tam
Christopher Ton
Angel Ren
39
3
0
07 Aug 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Lichao Zhang
Jia Yu
Shuai Zhang
Long Li
Yangyang Zhong
...
Fangsheng Weng
Fayu Pan
Jing Li
Renjun Xu
Zhenzhong Lan
32
4
0
21 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
46
17
0
12 Jun 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
37
1
0
10 Apr 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
Dong Won Lee
Hae Won Park
Yoon Kim
C. Breazeal
Louis-Philippe Morency
32
0
0
17 Mar 2024
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents
Shuai Zhang
Yu Lu
Junwen Liu
Jia Yu
Huachuan Qiu
Yuming Yan
Zhenzhong Lan
45
5
0
18 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
118
93
0
22 Jan 2024
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Xiaoding Lu
Zongyi Liu
Adian Liusie
Vyas Raina
Vineet Mudupalli
Yuwen Zhang
W. Beauchamp
22
15
0
04 Jan 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAG
OffRL
LRM
31
31
0
30 Nov 2023
Social AI Improves Well-Being Among Female Young Adults
Ebony Zhang
Xiaoding Lu
AI4MH
13
2
0
12 Nov 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
51
7
0
26 Jul 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
35
136
0
07 Jun 2023
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
21
2
0
05 Jun 2023
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
31
3
0
18 Apr 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
328
11,953
0
04 Mar 2022
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
1