ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.00934
  4. Cited By
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human
  Feedback

ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback

1 April 2024
Zhenyu Hou
Yiin Niu
Zhengxiao Du
Xiaohan Zhang
Xiao Liu
Aohan Zeng
Qinkai Zheng
Minlie Huang
Hongning Wang
Jie Tang
Yuxiao Dong
    ALM
ArXivPDFHTML

Papers citing "ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback"

17 / 17 papers shown
Title
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
Optimizing Safe and Aligned Language Generation: A Multi-Objective GRPO Approach
Xuying Li
Zhuo Li
Yuji Kosuga
Victor Bian
45
3
0
26 Mar 2025
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
83
4
0
19 Mar 2025
QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
QExplorer: Large Language Model Based Query Extraction for Toxic Content Exploration
Shaola Ren
Li Ke
Longtao Huang
Dehong Gao
Hui Xue
38
0
0
06 Feb 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
Dahua Lin
Jiaqi Wang
VLM
74
19
0
21 Jan 2025
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng
Xiao-Chang Liu
C. Wang
Xiaotao Gu
Yaojie Lu
Dan Zhang
Yuxiao Dong
J. Tang
Hongning Wang
Minlie Huang
LRM
126
3
0
16 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jingyang Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
LongReward: Improving Long-context Large Language Models with AI
  Feedback
LongReward: Improving Long-context Large Language Models with AI Feedback
J. Zhang
Zhongni Hou
Xin Lv
S. Cao
Zhenyu Hou
Yilin Niu
Lei Hou
Yuxiao Dong
Ling Feng
Juanzi Li
OffRL
LRM
38
8
0
28 Oct 2024
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning
Yaxiong Wang
Y. Wang
Lianwei Wu
Lechao Cheng
Zhun Zhong
Meng Wang
VLM
32
0
0
23 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference
  Data
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Qingyao Ai
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
50
10
0
01 Oct 2024
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts
Sukai Huang
N. Lipovetzky
Trevor Cohn
DiffM
23
1
0
24 Sep 2024
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai
Jiajie Zhang
Xin Lv
Linzhi Zheng
Siqi Zhu
Lei Hou
Yuxiao Dong
Jie Tang
Juanzi Li
VGen
LLMAG
ALM
39
39
0
13 Aug 2024
Better RAG using Relevant Information Gain
Better RAG using Relevant Information Gain
Marc Pickett
Jeremy Hartman
Ayan Kumar Bhowmick
Raquib-ul Alam
Aditya Vempaty
RALM
37
3
0
16 Jul 2024
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family
  Models
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models
Yalan Qin
Chongye Guo
Borong Zhang
Boyuan Chen
Josef Dai
Boren Zheng
Tianyi Qiu
Boxun Li
Yaodong Yang
45
25
0
20 Jun 2024
Eliminating Biased Length Reliance of Direct Preference Optimization via
  Down-Sampled KL Divergence
Eliminating Biased Length Reliance of Direct Preference Optimization via Down-Sampled KL Divergence
Junru Lu
Jiazheng Li
Siyu An
Meng Zhao
Yulan He
Di Yin
Xing Sun
44
14
0
16 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
57
11
0
11 Jun 2024
Tele-FLM Technical Report
Tele-FLM Technical Report
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Chao Wang
...
Yequan Wang
Zhongjiang He
Zhongyuan Wang
Xuelong Li
Tiejun Huang
35
3
0
25 Apr 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
319
11,953
0
04 Mar 2022
1