Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.11555
Cited By
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
17 February 2025
Yingshui Tan
Yilei Jiang
Heng Chang
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models"
1 / 1 papers shown
Title
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Shannon Lodoen
Alexi Orchard
18
0
0
14 May 2025
1