ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11555
  4. Cited By
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

17 February 2025
Yingshui Tan
Yilei Jiang
Heng Chang
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
    ALM
ArXivPDFHTML

Papers citing "Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models"

1 / 1 papers shown
Title
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach
Shannon Lodoen
Alexi Orchard
20
0
0
14 May 2025
1