Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.00355
Cited By
v1
v2 (latest)
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
1 January 2023
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits"
24 / 24 papers shown
Title
Augmented Adversarial Trigger Learning
Zhe Wang
Yanjun Qi
96
0
0
16 Mar 2025
Evaluating and Aligning Human Economic Risk Preferences in LLMs
Qingbin Liu
Yi Yang
Kar Yan Tam
125
0
0
09 Mar 2025
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
131
13
0
22 Sep 2024
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Eunseop Yoon
Hee Suk Yoon
Soohwan Eom
Gunsoo Han
D. W. Nam
DaeJin Jo
Kyoung-Woon On
M. Hasegawa-Johnson
Sungwoong Kim
C. Yoo
ALM
109
21
0
23 Jul 2024
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
134
9
0
17 Jun 2024
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
256
13
0
06 Jun 2024
Aligning LLM Agents by Learning Latent Preference from User Edits
Ge Gao
Alexey Taymanov
Eduardo Salinas
Paul Mineiro
Dipendra Kumar Misra
LLMAG
94
31
0
23 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
91
38
0
12 Apr 2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue
Jie Cheng
Longteng Guo
Xingyuan Dai
Zijia Zhao
Xingjian He
Gang Xiong
Yisheng Lv
Jing Liu
115
11
0
20 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
122
17
0
07 Mar 2024
TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space
Shaolei Zhang
Tian Yu
Yang Feng
HILM
KELM
106
52
0
27 Feb 2024
COPR: Continual Human Preference Learning via Optimal Policy Regularization
Han Zhang
Lin Gui
Yu Lei
Yuanzhao Zhai
Yehong Zhang
...
Hui Wang
Yue Yu
Kam-Fai Wong
Bin Liang
Ruifeng Xu
CLL
108
5
0
22 Feb 2024
CMDAG: A Chinese Metaphor Dataset with Annotated Grounds as CoT for Boosting Metaphor Generation
Yujie Shao
Xinrong Yao
Xingwei Qu
Chenghua Lin
Shi Wang
Stephen W. Huang
Ge Zhang
Jie Fu
80
6
0
20 Feb 2024
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Tianyu Cui
Yanling Wang
Chuanpu Fu
Yong Xiao
Sijia Li
...
Junwu Xiong
Xinyu Kong
ZuJie Wen
Ke Xu
Qi Li
165
64
0
11 Jan 2024
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
101
32
0
07 Nov 2023
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
Hannah Rose Kirk
Andrew M. Bean
Bertie Vidgen
Paul Röttger
Scott A. Hale
ALM
117
50
0
11 Oct 2023
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
116
4
0
03 Oct 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
115
207
0
26 Sep 2023
Subjective Crowd Disagreements for Subjective Data: Uncovering Meaningful CrowdOpinion with Population-level Learning
Tharindu Cyril Weerasooriya
Sarah K. K. Luger
Saloni Poddar
Ashiqur R. KhudaBukhsh
Christopher Homan
100
5
0
07 Jul 2023
Personalized Abstractive Summarization by Tri-agent Generation Pipeline
Md Aminul Haque Palash
Sourav Saha
Faria Afrin
Pengcheng He
108
3
0
04 May 2023
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
Jiashuo Sun
Yi Luo
Yeyun Gong
Chen Lin
Yelong Shen
Jian Guo
Nan Duan
LRM
112
21
0
23 Apr 2023
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
108
107
0
09 Mar 2023
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
95
17
0
18 Feb 2023
The Touché23-ValueEval Dataset for Identifying Human Values behind Arguments
Nailia Mirzakhmedova
Johannes Kiesel
Milad Alshomary
Maximilian Heinrich
Nicolas Handke
...
Mohammad Ali Sadraei
Ehsaneddin Asgari
Lea Kawaletz
Henning Wachsmuth
Benno Stein
84
41
0
31 Jan 2023
1