ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.01257
  4. Cited By
HelpSteer2-Preference: Complementing Ratings with Preferences
v1v2 (latest)

HelpSteer2-Preference: Complementing Ratings with Preferences

2 October 2024
Zhilin Wang
Alexander Bukharin
Olivier Delalleau
Daniel Egert
Gerald Shen
Jiaqi Zeng
Oleksii Kuchaiev
Yi Dong
    ALM
ArXiv (abs)PDFHTML

Papers citing "HelpSteer2-Preference: Complementing Ratings with Preferences"

46 / 46 papers shown
Title
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
SPARE: Single-Pass Annotation with Reference-Guided Evaluation for Automatic Process Supervision and Reward Modelling
Md Imbesat Hassan Rizvi
Xiaodan Zhu
Iryna Gurevych
LRM
35
0
0
18 Jun 2025
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Textual Bayes: Quantifying Uncertainty in LLM-Based Systems
Brendan Leigh Ross
Noël Vouitsis
Atiyeh Ashari Ghomi
Rasa Hosseinzadeh
Ji Xin
...
Yi Sui
Shiyi Hou
Kin Kwan Leung
Gabriel Loaiza-Ganem
Jesse C. Cresswell
67
0
0
11 Jun 2025
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding
Feifan Song
Shaohang Wei
Wen Luo
Yuxuan Fan
Tianyu Liu
Guoyin Wang
Houfeng Wang
12
0
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAMLLRM
12
0
0
09 Jun 2025
Quantitative LLM Judges
Quantitative LLM Judges
Aishwarya Sahoo
Jeevana Kruthi Karnuthala
Tushar Parmanand Budhwani
Pranchal Agarwal
Sankaran Vaidyanathan
...
Jennifer Healey
Nedim Lipka
Ryan Rossi
Uttaran Bhattacharya
Branislav Kveton
ELM
54
0
0
03 Jun 2025
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Towards Reward Fairness in RLHF: From a Resource Allocation Perspective
Sheng Ouyang
Yulan Hu
Ge Chen
Qingyang Li
Fuzheng Zhang
Yong Liu
32
0
0
29 May 2025
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Optimal Transport-Based Token Weighting scheme for Enhanced Preference Optimization
Meng Li
Guangda Huzhang
Haibo Zhang
Xiting Wang
Anxiang Zeng
39
0
0
24 May 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong
Changlong Yu
Liang Qiu
Weixiang Yan
Zhenghao Xu
...
Qingru Zhang
Qin Lu
Xin Liu
Chao Zhang
Tuo Zhao
OffRLReLMLRM
86
0
0
22 May 2025
Think-J: Learning to Think for Generative LLM-as-a-Judge
Think-J: Learning to Think for Generative LLM-as-a-Judge
Hui Huang
Yancheng He
Hongli Zhou
Rui Zhang
Wei Liu
Weixun Wang
Wenbo Su
Bo Zheng
Jiaheng Liu
LLMAGAILawELMLRM
71
1
0
20 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
213
1
0
19 May 2025
Krikri: Advancing Open Large Language Models for Greek
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
89
1
0
19 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Ziyi Wang
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
101
2
0
16 May 2025
A Systematic Analysis of Base Model Choice for Reward Modeling
A Systematic Analysis of Base Model Choice for Reward Modeling
Kian Ahrabian
Pegah Jandaghi
Negar Mokhberian
Sai Praneeth Karimireddy
Jay Pujara
131
0
0
16 May 2025
TRAIL: Trace Reasoning and Agentic Issue Localization
TRAIL: Trace Reasoning and Agentic Issue Localization
Darshan Deshpande
Varun Gangal
Hersh Mehta
Jitin Krishnan
Anand Kannappan
Rebecca Qian
131
0
0
13 May 2025
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data
David de-Fitero-Dominguez
Antonio Garcia-Cabot
Eva García-López
SyDa
110
0
0
12 May 2025
Mapping the Italian Telegram Ecosystem: Communities, Toxicity, and Hate Speech
Mapping the Italian Telegram Ecosystem: Communities, Toxicity, and Hate Speech
Lorenzo Alvisi
S. Tardelli
Maurizio Tesconi
422
0
0
28 Apr 2025
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Direct Advantage Regression: Aligning LLMs with Online AI Reward
Li He
He Zhao
Stephen Wan
Dadong Wang
Lina Yao
Tongliang Liu
82
0
0
19 Apr 2025
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Persona-judge: Personalized Alignment of Large Language Models via Token-level Self-judgment
Xiaotian Zhang
Ruizhe Chen
Yang Feng
Zuozhu Liu
109
2
0
17 Apr 2025
FLIP Reasoning Challenge
FLIP Reasoning Challenge
Andreas Plesner
Turlan Kuzhagaliyev
Roger Wattenhofer
AAMLVLMLRM
187
0
0
16 Apr 2025
Adversarial Training of Reward Models
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Ziyi Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
171
2
0
08 Apr 2025
NoveltyBench: Evaluating Language Models for Humanlike Diversity
NoveltyBench: Evaluating Language Models for Humanlike Diversity
Yiming Zhang
Harshita Diddee
Susan Holm
Hanchen Liu
Xinyue Liu
Vinay Samuel
Barry Wang
Daphne Ippolito
143
1
0
07 Apr 2025
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
AIR: A Systematic Analysis of Annotations, Instructions, and Response Pairs in Preference Dataset
Bingxiang He
Wenbin Zhang
Jiaxi Song
Cheng Qian
Z. Fu
...
Hui Xue
Ganqu Cui
Wanxiang Che
Zhiyuan Liu
Maosong Sun
104
0
0
04 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
Ran Xu
Shirong Ma
Chong Ruan
Ziwei Sun
Yang Liu
Y. Wu
OffRLLRM
201
54
0
03 Apr 2025
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs
Zhuoshi Pan
Yu Li
Honglin Lin
Qizhi Pei
Zinan Tang
Wei Wu
Chenlin Ming
H. Vicky Zhao
Zeang Sheng
Lijun Wu
LRM
152
6
0
21 Mar 2025
Tuning LLMs by RAG Principles: Towards LLM-native Memory
Tuning LLMs by RAG Principles: Towards LLM-native Memory
Jiale Wei
Shuchi Wu
Ruochen Liu
Xiang Ying
Jingbo Shang
Fangbo Tao
RALM
102
0
0
20 Mar 2025
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?
Can LLMs Formally Reason as Abstract Interpreters for Program Analysis?
Jacqueline L. Mitchell
Brian Hyeongseok Kim
Chenyu Zhou
Chao Wang
LRM
89
0
0
16 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
85
1
0
14 Mar 2025
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
Jiacheng Ruan
Wenzhen Yuan
Xian Gao
Ye Guo
Daoxin Zhang
Zhe Xu
Yao Hu
Ting Liu
Yuzhuo Fu
LRMVLM
165
6
0
10 Mar 2025
Improving LLM-as-a-Judge Inference with the Judgment Distribution
Victor Wang
Michael J.Q. Zhang
Eunsol Choi
114
4
0
04 Mar 2025
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Preference Learning Unlocks LLMs' Psycho-Counseling Skills
Mian Zhang
S. Eack
Zhiyu Zoey Chen
142
2
0
27 Feb 2025
Expect the Unexpected: FailSafe Long Context QA for Finance
Expect the Unexpected: FailSafe Long Context QA for Finance
Kiran Kamble
M. Russak
Dmytro Mozolevskyi
Muayad Ali
Mateusz Russak
Waseem Alshikh
126
0
0
10 Feb 2025
Improving Video Generation with Human Feedback
Improving Video Generation with Human Feedback
Jie Liu
Gongye Liu
Jiajun Liang
Ziyang Yuan
Xiaokun Liu
...
Pengfei Wan
Di Zhang
Kun Gai
Yujiu Yang
Wanli Ouyang
VGenEGVM
163
26
0
23 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
Wentao Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
251
25
0
21 Jan 2025
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
A Roadmap to Guide the Integration of LLMs in Hierarchical Planning
Israel Puerta-Merino
Carlos Núnez-Molina
Pablo Mesejo
Juan Fernández-Olivares
143
3
0
14 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRLELM
467
0
0
31 Dec 2024
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
  Generation for Preference Alignment
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
Zhuoran Jin
Hongbang Yuan
Tianyi Men
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
ALM
179
10
0
18 Dec 2024
Structured Extraction of Real World Medical Knowledge using LLMs for
  Summarization and Search
Structured Extraction of Real World Medical Knowledge using LLMs for Summarization and Search
Edward Kim
Manil Shrestha
Richard Foty
Tom DeLay
Vicki Seyfert-Margolis
161
1
0
16 Dec 2024
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich
Tomer Ronen
Talor Abramovich
Nir Ailon
Nave Assaf
...
Ido Shahaf
Oren Tropp
Omer Ullman Argov
Ran Zilberstein
Ran El-Yaniv
211
4
0
28 Nov 2024
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li
Y. X. Wei
Zhihui Xie
Xuqing Yang
Yifan Song
...
Tianyu Liu
Sujian Li
Bill Yuchen Lin
Dianbo Sui
Qiang Liu
VLMCoGe
194
32
0
26 Nov 2024
Self-Generated Critiques Boost Reward Modeling for Language Models
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRMALM
181
27
0
25 Nov 2024
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback
Lester James V. Miranda
Yizhong Wang
Yanai Elazar
Sachin Kumar
Valentina Pyatkin
Faeze Brahman
Noah A. Smith
Hannaneh Hajishirzi
Pradeep Dasigi
138
12
0
24 Oct 2024
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing
Chen Yang
Chenyang Zhao
Q. Gu
Dongruo Zhou
LRM
70
0
0
22 Oct 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu
Liwei Jiang
Yejin Choi
132
9
0
03 Oct 2024
RepairBench: Leaderboard of Frontier Models for Program Repair
RepairBench: Leaderboard of Frontier Models for Program Repair
André Silva
Martin Monperrus
KELM
60
9
0
27 Sep 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
200
1
0
04 Jul 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
169
403
0
06 Apr 2024
1