ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDaMoMe
ArXiv (abs)PDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,202 papers shown
Title
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to
  Guardrail Models for Virtual Assistants
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
80
3
0
27 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Helen Zhou
LM&MA
214
682
0
26 Apr 2023
Towards Explainable and Safe Conversational Agents for Mental Health: A
  Survey
Towards Explainable and Safe Conversational Agents for Mental Health: A Survey
Surjodeep Sarkar
Manas Gaur
L. Chen
Muskan Garg
Biplav Srivastava
B. Dongaonkar
AI4MH
54
2
0
25 Apr 2023
A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on
  Chinese Instruction Data for Instruction Following Large Language Model
A Comparative Study between Full-Parameter and LoRA-based Fine-Tuning on Chinese Instruction Data for Instruction Following Large Language Model
Xianghui Sun
Yunjie Ji
Baochang Ma
Xiangang Li
ALM
85
19
0
17 Apr 2023
Towards Better Instruction Following Language Models for Chinese:
  Investigating the Impact of Training Data and Evaluation
Towards Better Instruction Following Language Models for Chinese: Investigating the Impact of Training Data and Evaluation
Yunjie Ji
Yan Gong
Yong Deng
Yiping Peng
Qiang Niu
Baochang Ma
Xiangang Li
ALMELM
102
25
0
16 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
151
470
0
13 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
  Generation
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
159
413
0
12 Apr 2023
Boosted Prompt Ensembles for Large Language Models
Boosted Prompt Ensembles for Large Language Models
Silviu Pitis
Michael Ruogu Zhang
Andrew Wang
Jimmy Ba
LRMLLMAG
70
43
0
12 Apr 2023
Positive AI: Key Challenges in Designing Artificial Intelligence for
  Wellbeing
Positive AI: Key Challenges in Designing Artificial Intelligence for Wellbeing
Willem van der Maden
Derek Lomas
Malak Sadek
P. Hekkert
85
2
0
12 Apr 2023
Teaching Large Language Models to Self-Debug
Teaching Large Language Models to Self-Debug
Xinyun Chen
Maxwell Lin
Nathanael Scharli
Denny Zhou
LRM
142
711
0
11 Apr 2023
Instruction Tuning with GPT-4
Instruction Tuning with GPT-4
Baolin Peng
Chunyuan Li
Pengcheng He
Michel Galley
Jianfeng Gao
SyDaALMLM&MA
236
625
0
06 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLMLRM
134
182
0
04 Apr 2023
Eight Things to Know about Large Language Models
Eight Things to Know about Large Language Models
Sam Bowman
ALM
103
116
0
02 Apr 2023
Towards Healthy AI: Large Language Models Need Therapists Too
Towards Healthy AI: Large Language Models Need Therapists Too
Baihan Lin
Djallel Bouneffouf
Guillermo Cecchi
Kush R. Varshney
AI4MH
87
19
0
02 Apr 2023
CAMEL: Communicative Agents for "Mind" Exploration of Large Language
  Model Society
CAMEL: Communicative Agents for "Mind" Exploration of Large Language Model Society
Ge Li
Hasan Hammoud
Hani Itani
Dmitrii Khizbullin
Guohao Li
SyDaALM
175
520
0
31 Mar 2023
Whose Opinions Do Language Models Reflect?
Whose Opinions Do Language Models Reflect?
Shibani Santurkar
Esin Durmus
Faisal Ladhak
Cinoo Lee
Percy Liang
Tatsunori Hashimoto
115
448
0
30 Mar 2023
Language Models can Solve Computer Tasks
Language Models can Solve Computer Tasks
Geunwoo Kim
Pierre Baldi
Stephen Marcus McAleer
LLMAGLM&Ro
170
374
0
30 Mar 2023
Training Language Models with Language Feedback at Scale
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
105
107
0
28 Mar 2023
Foundation Models and Fair Use
Foundation Models and Fair Use
Peter Henderson
Xuechen Li
Dan Jurafsky
Tatsunori Hashimoto
Mark A. Lemley
Percy Liang
85
126
0
28 Mar 2023
Exploring the Impact of Instruction Data Scaling on Large Language
  Models: An Empirical Study on Real-World Use Cases
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases
Yunjie Ji
Yong Deng
Yan Gong
Yiping Peng
Qiang Niu
Lefei Zhang
Baochang Ma
Xiangang Li
ALM
70
97
0
26 Mar 2023
Fundamentals of Generative Large Language Models and Perspectives in
  Cyber-Defense
Fundamentals of Generative Large Language Models and Perspectives in Cyber-Defense
Andrei Kucharavy
Z. Schillaci
Loic Maréchal
Maxime Wursch
Ljiljana Dolamic
Remi Sabonnadiere
Dimitri Percia David
Alain Mermoud
Vincent Lenders
ELMAI4CE
83
32
0
21 Mar 2023
Unit Scaling: Out-of-the-Box Low-Precision Training
Unit Scaling: Out-of-the-Box Low-Precision Training
Charlie Blake
Douglas Orr
Carlo Luschi
MQ
66
7
0
20 Mar 2023
Protecting Society from AI Misuse: When are Restrictions on Capabilities
  Warranted?
Protecting Society from AI Misuse: When are Restrictions on Capabilities Warranted?
Markus Anderljung
Julian Hazell
94
32
0
16 Mar 2023
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on
  Consistency with Human Preferences
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Yunjie Ji
Yan Gong
Yiping Peng
Chao Ni
Peiyan Sun
Dongyu Pan
Baochang Ma
Xiangang Li
ELMALMAI4MH
76
38
0
14 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for
  the alignment of large language models with personalised feedback
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
106
107
0
09 Mar 2023
A Comprehensive Survey of AI-Generated Content (AIGC): A History of
  Generative AI from GAN to ChatGPT
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT
Yihan Cao
Siyu Li
Yixin Liu
Zhiling Yan
Yutong Dai
Philip S. Yu
Lichao Sun
105
555
0
07 Mar 2023
Perspectives on the Social Impacts of Reinforcement Learning with Human
  Feedback
Perspectives on the Social Impacts of Reinforcement Learning with Human Feedback
Gabrielle K. Liu
OffRL
116
21
0
06 Mar 2023
Almanac: Retrieval-Augmented Language Models for Clinical Medicine
Almanac: Retrieval-Augmented Language Models for Clinical Medicine
C. Zakka
Akash Chaurasia
R. Shad
Alex R. Dalal
Jennifer L. Kim
...
Kathleen Boyd
Karen Hirsch
C. Langlotz
Joanna Nelson
W. Hiesinger
LM&MA
169
163
0
01 Mar 2023
Goal Driven Discovery of Distributional Differences via Language
  Descriptions
Goal Driven Discovery of Distributional Differences via Language Descriptions
Ruiqi Zhong
Peter Zhang
Steve Li
Jinwoo Ahn
Dan Klein
Jacob Steinhardt
115
53
0
28 Feb 2023
Aligning Text-to-Image Models using Human Feedback
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
130
285
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks,
  Evaluations, and Improvements
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Minlie Huang
LM&MAELM
95
17
0
18 Feb 2023
Machine Love
Machine Love
Joel Lehman
125
5
0
18 Feb 2023
Complex QA and language models hybrid architectures, Survey
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
206
16
0
17 Feb 2023
Augmented Language Models: a Survey
Augmented Language Models: a Survey
Grégoire Mialon
Roberto Dessì
Maria Lomeli
Christoforos Nalmpantis
Ramakanth Pasunuru
...
Jane Dwivedi-Yu
Asli Celikyilmaz
Edouard Grave
Yann LeCun
Thomas Scialom
LRMKELM
99
391
0
15 Feb 2023
Transformer models: an introduction and catalog
Transformer models: an introduction and catalog
X. Amatriain
Ananth Sankar
Jie Bing
Praveen Kumar Bodigutla
Timothy J. Hazen
Michaeel Kazi
113
53
0
12 Feb 2023
Chain of Hindsight Aligns Language Models with Feedback
Chain of Hindsight Aligns Language Models with Feedback
Hao Liu
Carmelo Sferrazza
Pieter Abbeel
ALM
139
124
0
06 Feb 2023
Regulating ChatGPT and other Large Generative AI Models
Regulating ChatGPT and other Large Generative AI Models
P. Hacker
A. Engel
M. Mauer
AILaw
145
354
0
05 Feb 2023
Using In-Context Learning to Improve Dialogue Safety
Using In-Context Learning to Improve Dialogue Safety
Nicholas Meade
Spandana Gella
Devamanyu Hazarika
Prakhar Gupta
Di Jin
Siva Reddy
Yang Liu
Dilek Z. Hakkani-Tür
121
39
0
02 Feb 2023
The Flan Collection: Designing Data and Methods for Effective
  Instruction Tuning
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
122
678
0
31 Jan 2023
Principled Reinforcement Learning with Human Feedback from Pairwise or
  $K$-wise Comparisons
Principled Reinforcement Learning with Human Feedback from Pairwise or KKK-wise Comparisons
Banghua Zhu
Jiantao Jiao
Michael I. Jordan
OffRL
145
209
0
26 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
189
36
0
01 Jan 2023
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
67
1
0
24 Dec 2022
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA
Self-Prompting Large Language Models for Zero-Shot Open-Domain QA
Junlong Li
Jinyuan Wang
Zhuosheng Zhang
Hai Zhao
LRM
97
38
0
16 Dec 2022
Reward Gaming in Conditional Text Generation
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
101
26
0
16 Nov 2022
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal
  Specification and Reward Learning from Human Preferences
Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences
L. Guan
Karthik Valmeekam
Subbarao Kambhampati
97
8
0
28 Oct 2022
Reinforcement Learning and Bandits for Speech and Language Processing:
  Tutorial, Review and Outlook
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
Baihan Lin
OffRLAI4TS
127
27
0
24 Oct 2022
Mind's Eye: Grounded Language Model Reasoning through Simulation
Mind's Eye: Grounded Language Model Reasoning through Simulation
Ruibo Liu
Jason W. Wei
S. Gu
Te-Yen Wu
Soroush Vosoughi
Claire Cui
Denny Zhou
Andrew M. Dai
ReLMLRM
217
83
0
11 Oct 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
139
192
0
30 Aug 2022
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
What Do NLP Researchers Believe? Results of the NLP Community Metasurvey
Julian Michael
Ari Holtzman
Alicia Parrish
Aaron Mueller
Alex Jinpeng Wang
...
Divyam Madaan
Nikita Nangia
Richard Yuanzhe Pang
Jason Phang
Sam Bowman
71
39
0
26 Aug 2022
Non-Programmers Can Label Programs Indirectly via Active Examples: A
  Case Study with Text-to-SQL
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Ruiqi Zhong
Charles Burton Snell
Dan Klein
Jason Eisner
115
9
0
25 May 2022
Previous
123...232425
Next