Title
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning Can Jin Hongwu Peng Qixin Zhang Yujin Tang Dimitris N. Metaxas Tong Che LLMAG LRM 172 2 0 14 Apr 2025
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment Yucheng Suo Fan Ma Linchao Zhu T. Wang Fengyun Rao Yi Yang LRM 77 0 0 26 Mar 2025
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Hao Peng Y. Qi Xiaozhi Wang Zijun Yao Bin Xu Lei Hou Juanzi Li ALM LRM 62 4 0 26 Feb 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling Bradley Brown Jordan Juravsky Ryan Ehrlich Ronald Clark Quoc V. Le Christopher Ré Azalia Mirhoseini ALM LRM 95 224 0 03 Jan 2025
MSc-SQL: Multi-Sample Critiquing Small Language Models For Text-To-SQL Translation S. Gorti Ilan Gofman Zhaoyan Liu Jiapeng Wu Noël Vouitsis Guangwei Yu Jesse C. Cresswell Rasa Hosseinzadeh SyDa 58 6 0 16 Oct 2024
Enhancing AI Assisted Writing with One-Shot Implicit Negative Feedback Benjamin Towle Ke Zhou 26 0 0 14 Oct 2024
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF Zhaolin Gao Wenhao Zhan Jonathan D. Chang Gokul Swamy Kianté Brantley Jason D. Lee Wen Sun OffRL 61 3 0 06 Oct 2024
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation Ilya Gusev LLMAG 58 3 0 10 Sep 2024
A Comparison of LLM Finetuning Methods & Evaluation Metrics with Travel Chatbot Use Case Sonia Meyer Shreya Singh Bertha Tam Christopher Ton Angel Ren 42 4 0 07 Aug 2024
Unveiling the Impact of Multi-Modal Interactions on User Engagement: A Comprehensive Evaluation in AI-driven Conversations Lichao Zhang Jia Yu Shuai Zhang Long Li Yangyang Zhong ... Fangsheng Weng Fayu Pan Jing Li Renjun Xu Zhenzhong Lan 32 4 0 21 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences Daiwei Chen Yi Chen Aniket Rege Ramya Korlakai Vinayak 46 17 0 12 Jun 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents Seth Lazar SILM 37 0 0 10 Apr 2024
Improving Dialogue Agents by Decomposing One Global Explicit Annotation with Local Implicit Multimodal Feedback Dong Won Lee Hae Won Park Yoon Kim C. Breazeal Louis-Philippe Morency 32 0 0 17 Mar 2024
Unveiling the Secrets of Engaging Conversations: Factors that Keep Users Hooked on Role-Playing Dialog Agents Shuai Zhang Yu Lu Junwen Liu Jia Yu Huachuan Qiu Yuming Yan Zhenzhong Lan 45 5 0 18 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models Alexandre Ramé Nino Vieillard Léonard Hussenot Robert Dadashi Geoffrey Cideron Olivier Bachem Johan Ferret 120 94 0 22 Jan 2024
Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM Xiaoding Lu Zongyi Liu Adian Liusie Vyas Raina Vineet Mudupalli Yuwen Zhang W. Beauchamp 22 15 0 04 Jan 2024
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models Marwa Abdulhai Isadora White Charles Burton Snell Charles Sun Joey Hong Yuexiang Zhai Kelvin Xu Sergey Levine LLMAG OffRL LRM 36 31 0 30 Nov 2023
Social AI Improves Well-Being Among Female Young Adults Ebony Zhang Xiaoding Lu AI4MH 15 2 0 12 Nov 2023
Leveraging Implicit Feedback from Deployment Data in Dialogue Richard Yuanzhe Pang Stephen Roller Kyunghyun Cho He He Jason Weston 51 7 0 26 Jul 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards Alexandre Ramé Guillaume Couairon Mustafa Shukor Corentin Dancette Jean-Baptiste Gaya Laure Soulier Matthieu Cord MoMe 35 136 0 07 Jun 2023
The Chai Platform's AI Safety Framework Xiaoding Lu Aleksey Korshuk Z. Liu W. Beauchamp 21 2 0 05 Jun 2023
Safer Conversational AI as a Source of User Delight Xiaoding Lu Aleksey Korshuk Z. Liu W. Beauchamp Chai Research 31 3 0 18 Apr 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 339 12,003 0 04 Mar 2022
Pre-trained Models for Natural Language Processing: A Survey Xipeng Qiu Tianxiang Sun Yige Xu Yunfan Shao Ning Dai Xuanjing Huang LM&MA VLM 243 1,452 0 18 Mar 2020