ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.16637
  4. Cited By
Course-Correction: Safety Alignment Using Synthetic Preferences

Course-Correction: Safety Alignment Using Synthetic Preferences

23 July 2024
Rongwu Xu
Yishuo Cai
Zhenhong Zhou
Renjie Gu
Haiqin Weng
Yan Liu
Lei Bai
Wei Xu
Han Qiu
ArXivPDFHTML

Papers citing "Course-Correction: Safety Alignment Using Synthetic Preferences"

5 / 5 papers shown
Title
AI Awareness
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
59
0
0
25 Apr 2025
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Junfeng Fang
Yongbin Li
59
5
0
17 Oct 2024
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family
  Models
PKU-SafeRLHF: A Safety Alignment Preference Dataset for Llama Family Models
Yalan Qin
Chongye Guo
Borong Zhang
Boyuan Chen
Josef Dai
Boren Zheng
Tianyi Qiu
Boxun Li
Yaodong Yang
45
26
0
20 Jun 2024
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All
  Tools
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools
Team GLM
:
Aohan Zeng
Bin Xu
Bowen Wang
...
Zhaoyu Wang
Zhen Yang
Zhengxiao Du
Zhenyu Hou
Zihan Wang
ALM
70
500
0
18 Jun 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
12,003
0
04 Mar 2022
1