Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.08073
Cited By
Constitutional AI: Harmlessness from AI Feedback
15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Constitutional AI: Harmlessness from AI Feedback"
50 / 1,202 papers shown
Title
Building Resilient SMEs: Harnessing Large Language Models for Cyber Security in Australia
Benjamin Kereopa-Yorke
72
9
0
05 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
80
41
0
04 Jun 2023
Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions
Hui Yang
Sifu Yue
Yunzhong He
RALM
72
172
0
04 Jun 2023
AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap
Q. V. Liao
J. Vaughan
145
168
0
02 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
164
335
0
02 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
131
26
0
01 Jun 2023
Preference-grounded Token-level Guidance for Language Model Fine-tuning
Shentao Yang
Shujian Zhang
Congying Xia
Yihao Feng
Caiming Xiong
Mi Zhou
146
28
0
01 Jun 2023
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Paul Roit
Johan Ferret
Lior Shani
Roee Aharoni
Geoffrey Cideron
...
Olivier Bachem
G. Elidan
Avinatan Hassidim
Olivier Pietquin
Idan Szpektor
HILM
87
87
0
31 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
405
4,184
0
29 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice Oh
San-hee Park
Jung-Woo Ha
98
18
0
28 May 2023
Reward Collapse in Aligning Large Language Models
Ziang Song
Tianle Cai
Jason D. Lee
Weijie J. Su
ALM
74
23
0
28 May 2023
Chain-of-Thought Hub: A Continuous Effort to Measure Large Language Models' Reasoning Performance
Yao Fu
Litu Ou
Mingyu Chen
Yuhao Wan
Hao-Chun Peng
Tushar Khot
LLMAG
ELM
LRM
ReLM
80
115
0
26 May 2023
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks
Kai Zhang
Jun Yu
Eashan Adhikarla
Rong Zhou
Zhilin Yan
...
Xun Chen
Yong Chen
Quanzheng Li
Hongfang Liu
Lichao Sun
LM&MA
MedIm
103
185
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
78
56
0
26 May 2023
Heterogeneous Value Alignment Evaluation for Large Language Models
Zhaowei Zhang
Ceyao Zhang
N. Liu
Siyuan Qi
Ziqi Rong
Song-Chun Zhu
Shuguang Cui
Yaodong Yang
95
6
0
26 May 2023
On the Tool Manipulation Capability of Open-source Large Language Models
Qiantong Xu
Fenglu Hong
Yangqiu Song
Changran Hu
Zheng Chen
Jian Zhang
LLMAG
100
78
0
25 May 2023
Role-Play with Large Language Models
Murray Shanahan
Kyle McDonell
Laria Reynolds
LLMAG
82
301
0
25 May 2023
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
126
208
0
25 May 2023
HuatuoGPT, towards Taming Language Model to Be a Doctor
Hongbo Zhang
Junying Chen
Feng Jiang
Fei Yu
Zhihong Chen
...
Zhiyi Zhang
Qingying Xiao
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
AI4MH
ELM
89
205
0
24 May 2023
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback
Katherine Tian
E. Mitchell
Allan Zhou
Archit Sharma
Rafael Rafailov
Huaxiu Yao
Chelsea Finn
Christopher D. Manning
152
355
0
24 May 2023
PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
Anthony Chen
Panupong Pasupat
Sameer Singh
Hongrae Lee
Kelvin Guu
111
48
0
24 May 2023
ClusterLLM: Large Language Models as a Guide for Text Clustering
Yuwei Zhang
Zihan Wang
Jingbo Shang
109
60
0
24 May 2023
Psychological Metrics for Dialog System Evaluation
Salvatore Giorgi
Shreya Havaldar
Farhan S. Ahmed
Zuhaib Akhtar
Shalaka Vaidya
Gary Pan
Pallavi V. Kulkarni
H. Andrew Schwartz
Joao Sedoc
94
2
0
24 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
107
60
0
24 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRM
KELM
64
52
0
23 May 2023
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers
Artidoro Pagnoni
Ari Holtzman
Luke Zettlemoyer
ALM
163
2,641
0
23 May 2023
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim
J. H. Lee
Sungdong Kim
Joonsuk Park
Kang Min Yoo
S. Kwon
Dongsoo Lee
MQ
157
105
0
23 May 2023
Learning from Mistakes via Cooperative Study Assistant for Large Language Models
Danqing Wang
Lei Li
78
8
0
23 May 2023
Goal-Driven Explainable Clustering via Language Descriptions
Zihan Wang
Jingbo Shang
Ruiqi Zhong
104
42
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALM
SyDa
152
70
0
23 May 2023
Enhancing Large Language Models Against Inductive Instructions with Dual-critique Prompting
Rui Wang
Hongru Wang
Fei Mi
Yi Chen
Boyang Xue
Kam-Fai Wong
Rui-Lan Xu
83
17
0
23 May 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
154
377
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
152
608
0
22 May 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
117
168
0
22 May 2023
TheoremQA: A Theorem-driven Question Answering dataset
Wenhu Chen
Ming Yin
Max Ku
Pan Lu
Yixin Wan
Xueguang Ma
Jianyu Xu
Xinyi Wang
Tony Xia
AIMat
131
140
0
21 May 2023
Collaborative Development of NLP models
Fereshte Khani
Marco Tulio Ribeiro
78
2
0
20 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
148
398
0
19 May 2023
LIMA: Less Is More for Alignment
Chunting Zhou
Pengfei Liu
Puxin Xu
Srini Iyer
Jiao Sun
...
Susan Zhang
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
Omer Levy
ALM
143
853
0
18 May 2023
SLiC-HF: Sequence Likelihood Calibration with Human Feedback
Yao-Min Zhao
Rishabh Joshi
Tianqi Liu
Misha Khalman
Mohammad Saleh
Peter J. Liu
88
307
0
17 May 2023
Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback
Yao Fu
Hao-Chun Peng
Tushar Khot
Mirella Lapata
LLMAG
99
186
0
17 May 2023
RL4F: Generating Natural Language Feedback with Reinforcement Learning for Repairing Model Outputs
Afra Feyza Akyürek
Ekin Akyürek
Aman Madaan
Ashwin Kalyan
Peter Clark
Derry Wijaya
Niket Tandon
ALM
KELM
114
101
0
15 May 2023
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models
Yuzhen Huang
Yuzhuo Bai
Zhihao Zhu
Junlei Zhang
Jinghan Zhang
...
Yikai Zhang
Jiayi Lei
Yao Fu
Maosong Sun
Junxian He
ELM
LRM
121
552
0
15 May 2023
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Jizhi Zhang
Keqin Bao
Yang Zhang
Wenjie Wang
Fuli Feng
Xiangnan He
LRM
ALM
92
168
0
12 May 2023
Fine-tuning Language Models with Generative Adversarial Reward Modelling
Z. Yu
Lau Jia Jaw
Zhang Hui
Bryan Kian Hsiang Low
ALM
130
4
0
09 May 2023
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting
Miles Turpin
Julian Michael
Ethan Perez
Sam Bowman
ReLM
LRM
120
444
0
07 May 2023
An Overview of AI and Blockchain Integration for Privacy-Preserving
Zongwei Li
D. Kong
Yuanzheng Niu
Hongli Peng
Xiaoqi Li
Wenkai Li
79
24
0
06 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
121
339
0
04 May 2023
Should ChatGPT and Bard Share Revenue with Their Data Providers? A New Business Model for the AI Era
Dong Zhang
46
3
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
300
633
0
03 May 2023
Improving Contrastive Learning of Sentence Embeddings from AI Feedback
M. Abouheaf
W. Gueaieb
Md. Suruz Miah
D. Spinello
Xipeng Qiu
111
39
0
03 May 2023
Previous
1
2
3
...
22
23
24
25
Next