Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.02416
Cited By
Aligner: Efficient Alignment by Learning to Correct
4 February 2024
Jiaming Ji
Boyuan Chen
Hantao Lou
Chongye Guo
Borong Zhang
Xuehai Pan
Juntao Dai
Tianyi Qiu
Yaodong Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Aligner: Efficient Alignment by Learning to Correct"
21 / 21 papers shown
Title
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
41
0
0
26 May 2025
Redefining Superalignment: From Weak-to-Strong Alignment to Human-AI Co-Alignment to Sustainable Symbiotic Society
Yi Zeng
Yijiao Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
110
0
0
24 Apr 2025
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Hanyang Zhao
Haoxian Chen
Yucheng Guo
Genta Indra Winata
Tingting Ou
Ziyu Huang
D. Yao
Wenpin Tang
76
0
0
13 Mar 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
111
4
0
17 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
153
20
0
30 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
78
8
0
31 Dec 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
74
359
0
06 Apr 2024
Baichuan 2: Open Large-scale Language Models
Ai Ming Yang
Bin Xiao
Bingning Wang
Borong Zhang
Ce Bian
...
Youxin Jiang
Yuchen Gao
Yupeng Zhang
Guosheng Dong
Zhiying Wu
ELM
LRM
140
731
0
19 Sep 2023
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
95
293
0
17 Aug 2023
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales
Z. Yao
Reza Yazdani Aminabadi
Olatunji Ruwase
Samyam Rajbhandari
Xiaoxia Wu
...
Heyang Qin
Masahiro Tanaka
Shuai Che
Shuaiwen Leon Song
Yuxiong He
ALM
OffRL
71
72
0
02 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
236
4,186
0
09 Jun 2023
Small Language Models Improve Giants by Rewriting Their Outputs
Giorgos Vernikos
Arthur Bravzinskas
Jakub Adamek
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
BDL
LRM
52
15
0
22 May 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
152
1,583
0
15 Dec 2022
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani
Nivedita Suresh
Matthew E. Peters
LLMSV
60
88
0
10 May 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
94
762
0
01 Dec 2021
DialogSum: A Real-Life Scenario Dialogue Summarization Dataset
Yulong Chen
Yang Liu
Liang Chen
Yue Zhang
105
227
0
14 May 2021
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Kaichuang Yang
Dan Klein
89
324
0
12 Apr 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
451
4,662
0
23 Jan 2020
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
100
966
0
04 Dec 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
422
1,664
0
18 Sep 2019
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
236
18,685
0
20 Jul 2017
1