ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.06135
  4. Cited By
Rewarding Chatbots for Real-World Engagement with Millions of Users

Rewarding Chatbots for Real-World Engagement with Millions of Users

10 March 2023
R. Irvine
D. Boubert
Vyas Raina
Adian Liusie
Ziyi Zhu
Vineet Mudupalli
Aliaksei Korshuk
Z. Liu
Fritz Cremer
Valentin Assassi
Christie-Carol Beauchamp
Xiaoding Lu
Thomas Rialan
W. Beauchamp
    ALM
ArXivPDFHTML

Papers citing "Rewarding Chatbots for Real-World Engagement with Millions of Users"

24 / 24 papers shown
Title
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
172
2
0
14 Apr 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo
Fan Ma
Linchao Zhu
T. Wang
Fengyun Rao
Yi Yang
LRM
77
0
0
26 Mar 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
Hao Peng
Y. Qi
Xiaozhi Wang
Zijun Yao
Bin Xu
Lei Hou
Juanzi Li
ALM
LRM
62
4
0
26 Feb 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
224
0
03 Jan 2025
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation
S. Gorti
Ilan Gofman
Zhaoyan Liu
Jiapeng Wu
Noël Vouitsis
Guangwei Yu
Jesse C. Cresswell
Rasa Hosseinzadeh
SyDa
58
6
0
16 Oct 2024
Enhancing AI Assisted Writing with One-Shot Implicit Negative Feedback
Enhancing AI Assisted Writing with One-Shot Implicit Negative Feedback
Benjamin Towle
Ke Zhou
26
0
0
14 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao
Wenhao Zhan
Jonathan D. Chang
Gokul Swamy
Kianté Brantley
Jason D. Lee
Wen Sun
OffRL
61
3
0
06 Oct 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation
Ilya Gusev
LLMAG
58
3
0
10 Sep 2024
A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel
  Chatbot Use Case
A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case
Sonia Meyer
Shreya Singh
Bertha Tam
Christopher Ton
Angel Ren
42
4
0
07 Aug 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A
  Comprehensive Evaluation in AI-driven Conversations
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations
Lichao Zhang
Jia Yu
Shuai Zhang
Long Li
Yangyang Zhong
...
Fangsheng Weng
Fayu Pan
Jing Li
Renjun Xu
Zhenzhong Lan
32
4
0
21 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous
  Preferences
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
46
17
0
12 Jun 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of
  Generative Agents
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
37
0
0
10 Apr 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation
  with Local Implicit Multimodal Feedback
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback
Dong Won Lee
Hae Won Park
Yoon Kim
C. Breazeal
Louis-Philippe Morency
32
0
0
17 Mar 2024
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users
  Hooked on Role-Playing Dialog Agents
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents
Shuai Zhang
Yu Lu
Junwen Liu
Jia Yu
Huachuan Qiu
Yuming Yan
Zhenzhong Lan
45
5
0
18 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
120
94
0
22 Jan 2024
Blending Is All You Need: Cheaper, Better Alternative to
  Trillion-Parameters LLM
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM
Xiaoding Lu
Zongyi Liu
Adian Liusie
Vyas Raina
Vineet Mudupalli
Yuwen Zhang
W. Beauchamp
22
15
0
04 Jan 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
  Models
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAG
OffRL
LRM
36
31
0
30 Nov 2023
Social AI Improves Well-Being Among Female Young Adults
Social AI Improves Well-Being Among Female Young Adults
Ebony Zhang
Xiaoding Lu
AI4MH
15
2
0
12 Nov 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue
Leveraging Implicit Feedback from Deployment Data in Dialogue
Richard Yuanzhe Pang
Stephen Roller
Kyunghyun Cho
He He
Jason Weston
51
7
0
26 Jul 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
35
136
0
07 Jun 2023
The Chai Platform's AI Safety Framework
The Chai Platform's AI Safety Framework
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
21
2
0
05 Jun 2023
Safer Conversational AI as a Source of User Delight
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
31
3
0
18 Apr 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
1