ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.10164
  4. Cited By
Quantile Regression for Distributional Reward Models in RLHF

Quantile Regression for Distributional Reward Models in RLHF

16 September 2024
Nicolai Dorka
ArXivPDFHTML

Papers citing "Quantile Regression for Distributional Reward Models in RLHF"

7 / 7 papers shown
Title
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning
Xiaokun Wang
Chris
Jiangbo Pei
Wei Shen
Yi Peng
...
Ai Jian
Tianyidan Xie
Xuchen Song
Yang Liu
Yahui Zhou
OffRL
LRM
28
0
0
12 May 2025
Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
Sandcastles in the Storm: Revisiting the (Im)possibility of Strong Watermarking
Fabrice Harel-Canada
Boran Erol
Connor Choi
J. Liu
Gary Jiarui Song
Nanyun Peng
Amit Sahai
AAML
29
0
0
11 May 2025
Energy-Based Reward Models for Robust Language Model Alignment
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
137
0
0
17 Apr 2025
OASST-ETC Dataset: Alignment Signals from Eye-tracking Analysis of LLM Responses
OASST-ETC Dataset: Alignment Signals from Eye-tracking Analysis of LLM Responses
Angela Lopez-Cardona
Sebastian Idesis
Miguel Barreda-Ángeles
Sergi Abadal
Ioannis Arapakis
46
0
0
13 Mar 2025
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jingyang Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
47
22
0
01 Oct 2024
Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation
Conditional Quantile Estimation for Uncertain Watch Time in Short-Video Recommendation
Chengzhi Lin
Shuchang Liu
Chuyuan Wang
Yongqi Liu
27
3
0
17 Jul 2024
1