ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.07459
  4. Cited By
The Capacity for Moral Self-Correction in Large Language Models

The Capacity for Moral Self-Correction in Large Language Models

15 February 2023
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
Anna Chen
Anna Goldie
Azalia Mirhoseini
Catherine Olsson
Danny Hernandez
Dawn Drain
Dustin Li
Eli Tran-Johnson
Ethan Perez
John Kernion
Jamie Kerr
J. Mueller
J. Landau
Kamal Ndousse
Karina Nguyen
Liane Lovitt
Michael Sellitto
Nelson Elhage
Noemí Mercado
Nova Dassarma
Oliver Rausch
R. Lasenby
Robin Larson
Sam Ringer
Sandipan Kundu
Saurav Kadavath
Scott Johnston
Shauna Kravec
S. E. Showk
Tamera Lanham
Timothy Telleen-Lawton
T. Henighan
Tristan Hume
Yuntao Bai
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
    LRM
    ReLM
ArXivPDFHTML

Papers citing "The Capacity for Moral Self-Correction in Large Language Models"

50 / 115 papers shown
Title
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering
  with Multi-Granularity Answers
Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers
G. Yona
Roee Aharoni
Mor Geva
ELM
36
11
0
09 Jan 2024
Why Solving Multi-agent Path Finding with Large Language Model has not
  Succeeded Yet
Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet
Weizhe Chen
Sven Koenig
B. Dilkina
LM&Ro
LLMAG
AI4CE
59
16
0
08 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
25
9
0
03 Jan 2024
Learning and Forgetting Unsafe Examples in Large Language Models
Learning and Forgetting Unsafe Examples in Large Language Models
Jiachen Zhao
Zhun Deng
David Madras
James Zou
Mengye Ren
MU
KELM
CLL
80
16
0
20 Dec 2023
Evaluating and Mitigating Discrimination in Language Model Decisions
Evaluating and Mitigating Discrimination in Language Model Decisions
Alex Tamkin
Amanda Askell
Liane Lovitt
Esin Durmus
Nicholas Joseph
Shauna Kravec
Karina Nguyen
Jared Kaplan
Deep Ganguli
38
66
0
06 Dec 2023
Exchange-of-Thought: Enhancing Large Language Model Capabilities through
  Cross-Model Communication
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
Zhangyue Yin
Qiushi Sun
Cheng Chang
Qipeng Guo
Junqi Dai
Xuanjing Huang
Xipeng Qiu
LRM
48
48
0
04 Dec 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
53
20
0
28 Nov 2023
DUnE: Dataset for Unified Editing
DUnE: Dataset for Unified Editing
Afra Feyza Akyürek
Eric Pan
Garry Kuwanto
Derry Wijaya
KELM
27
17
0
27 Nov 2023
Justifiable Artificial Intelligence: Engineering Large Language Models
  for Legal Applications
Justifiable Artificial Intelligence: Engineering Large Language Models for Legal Applications
Sabine Wehnert
AILaw
29
4
0
27 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional
  Spectrum of Basic Human Values
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
22
21
0
15 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
29
15
0
10 Nov 2023
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain
  Data
Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data
Antonis Antoniades
Yiyi Yu
Joseph Canzano
William Wang
Spencer L. Smith
AI4CE
40
11
0
31 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
29
4
0
26 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
24
11
0
26 Oct 2023
Branch-Solve-Merge Improves Large Language Model Evaluation and
  Generation
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
23
70
0
23 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large
  Language Models via Instruction Learning
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
T. Lu
Xing Xie
Ning Gu
16
9
0
17 Oct 2023
Self-Detoxifying Language Models via Toxification Reversal
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong
Yi Cheng
Jiashuo Wang
Jian Wang
Wenjie Li
MU
16
29
0
14 Oct 2023
The Consensus Game: Language Model Generation via Equilibrium Search
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Paul Jacob
Yikang Shen
Gabriele Farina
Jacob Andreas
33
19
0
13 Oct 2023
Case Law Grounding: Aligning Judgments of Humans and AI on
  Socially-Constructed Concepts
Case Law Grounding: Aligning Judgments of Humans and AI on Socially-Constructed Concepts
Quan Ze Chen
Amy X. Zhang
ELM
56
2
0
10 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Yikang Shen
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALM
SyDa
29
35
0
09 Oct 2023
Thought Propagation: An Analogical Approach to Complex Reasoning with
  Large Language Models
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models
Junchi Yu
Ran He
Rex Ying
LRM
48
24
0
06 Oct 2023
Low-Resource Languages Jailbreak GPT-4
Low-Resource Languages Jailbreak GPT-4
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
SILM
23
169
0
03 Oct 2023
Large Language Models Cannot Self-Correct Reasoning Yet
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang
Xinyun Chen
Swaroop Mishra
Huaixiu Steven Zheng
Adams Wei Yu
Xinying Song
Denny Zhou
ReLM
LRM
27
419
0
03 Oct 2023
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind
  Aware GPT-4
Suspicion-Agent: Playing Imperfect Information Games with Theory of Mind Aware GPT-4
Jiaxian Guo
Bo Yang
Paul D. Yoo
Bill Yuchen Lin
Yusuke Iwasawa
Yutaka Matsuo
LLMAG
13
40
0
29 Sep 2023
In-Contextual Gender Bias Suppression for Large Language Models
In-Contextual Gender Bias Suppression for Large Language Models
Daisuke Oba
Masahiro Kaneko
Danushka Bollegala
23
8
0
13 Sep 2023
Generative AI
Generative AI
Stefan Feuerriegel
Jochen Hartmann
Christian Janiesch
Patrick Zschech
39
546
0
13 Sep 2023
Large Language Models as Optimizers
Large Language Models as Optimizers
Chengrun Yang
Xuezhi Wang
Yifeng Lu
Hanxiao Liu
Quoc V. Le
Denny Zhou
Xinyun Chen
ODL
35
375
0
07 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
42
151
0
05 Sep 2023
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open
  Generative Large Language Models
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models
Neha Sengupta
Sunil Kumar Sahu
Bokang Jia
Satheesh Katipomu
Haonan Li
...
A. Jackson
Hector Xuguang Ren
Preslav Nakov
Timothy Baldwin
Eric P. Xing
LRM
21
40
0
30 Aug 2023
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through
  the Lens of Moral Theories?
Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?
Jingyan Zhou
Minda Hu
Junan Li
Xiaoying Zhang
Xixin Wu
Irwin King
Helen M. Meng
LRM
42
24
0
29 Aug 2023
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and
  Vulnerabilities
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities
Maximilian Mozes
Xuanli He
Bennett Kleinberg
Lewis D. Griffin
36
76
0
24 Aug 2023
From Instructions to Intrinsic Human Values -- A Survey of Alignment
  Goals for Big Models
From Instructions to Intrinsic Human Values -- A Survey of Alignment Goals for Big Models
Jing Yao
Xiaoyuan Yi
Xiting Wang
Jindong Wang
Xing Xie
ALM
19
42
0
23 Aug 2023
Red-Teaming Large Language Models using Chain of Utterances for
  Safety-Alignment
Red-Teaming Large Language Models using Chain of Utterances for Safety-Alignment
Rishabh Bhardwaj
Soujanya Poria
ELM
17
127
0
18 Aug 2023
Self-Alignment with Instruction Backtranslation
Self-Alignment with Instruction Backtranslation
Xian Li
Ping Yu
Chunting Zhou
Timo Schick
Omer Levy
Luke Zettlemoyer
Jason Weston
M. Lewis
SyDa
24
123
0
11 Aug 2023
On the Unexpected Abilities of Large Language Models
On the Unexpected Abilities of Large Language Models
S. Nolfi
LRM
22
11
0
09 Aug 2023
Automatically Correcting Large Language Models: Surveying the landscape
  of diverse self-correction strategies
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies
Liangming Pan
Michael Stephen Saxon
Wenda Xu
Deepak Nathani
Xinyi Wang
William Yang Wang
KELM
LRM
38
201
0
06 Aug 2023
Evaluating the Moral Beliefs Encoded in LLMs
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
25
116
0
26 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
93
10,977
0
18 Jul 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLM
LRM
22
164
0
17 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Saeed Mian
OffRL
61
523
0
12 Jul 2023
Minimum Levels of Interpretability for Artificial Moral Agents
Minimum Levels of Interpretability for Artificial Moral Agents
Avish Vijayaraghavan
C. Badea
AI4CE
25
5
0
02 Jul 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language
  Models
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Kaser
Antoine Bosselut
22
3
0
01 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
33
205
0
28 Jun 2023
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
  Collaboration for Large Language Models
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Yufei Huang
Deyi Xiong
ALM
34
17
0
28 Jun 2023
Potential Benefits of Employing Large Language Models in Research in
  Moral Education and Development
Potential Benefits of Employing Large Language Models in Research in Moral Education and Development
Hyemin Han
LLMAG
17
9
0
23 Jun 2023
Inverse Scaling: When Bigger Isn't Better
Inverse Scaling: When Bigger Isn't Better
I. R. McKenzie
Alexander Lyzhov
Michael Pieler
Alicia Parrish
Aaron Mueller
...
Yuhui Zhang
Zhengping Zhou
Najoung Kim
Sam Bowman
Ethan Perez
25
126
0
15 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
30
303
0
02 Jun 2023
LM vs LM: Detecting Factual Errors via Cross Examination
LM vs LM: Detecting Factual Errors via Cross Examination
Roi Cohen
May Hamri
Mor Geva
Amir Globerson
HILM
29
117
0
22 May 2023
"According to ...": Prompting Language Models Improves Quoting from
  Pre-Training Data
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data
Orion Weller
Marc Marone
Nathaniel Weir
Dawn J Lawrie
Daniel Khashabi
Benjamin Van Durme
HILM
70
44
0
22 May 2023
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via
  Debate
Can ChatGPT Defend its Belief in Truth? Evaluating LLM Reasoning via Debate
Boshi Wang
Xiang Yue
Huan Sun
ELM
LRM
24
59
0
22 May 2023
Previous
123
Next