ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDaMoMe
ArXiv (abs)PDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,202 papers shown
Title
CycleAlign: Iterative Distillation from Black-box LLM to White-box
  Models for Better Human Alignment
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong
Quan Tu
Cai Chen
Xing Gao
Ji Zhang
Rui Yan
ALM
85
11
0
25 Oct 2023
Can You Follow Me? Testing Situational Understanding in ChatGPT
Can You Follow Me? Testing Situational Understanding in ChatGPT
Chenghao Yang
Allyson Ettinger
LRMLLMAGELM
140
4
0
24 Oct 2023
SoK: Memorization in General-Purpose Large Language Models
SoK: Memorization in General-Purpose Large Language Models
Valentin Hartmann
Anshuman Suri
Vincent Bindschaedler
David Evans
Shruti Tople
Robert West
KELMLLMAG
92
24
0
24 Oct 2023
Branch-Solve-Merge Improves Large Language Model Evaluation and
  Generation
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
99
77
0
23 Oct 2023
Ensemble-Instruct: Generating Instruction-Tuning Data with a
  Heterogeneous Mixture of LMs
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs
Young-Suk Lee
Md Arafat Sultan
Yousef El-Kurdi
Tahira Naseem Asim Munawar
Radu Florian
Salim Roukos
Ramón Fernández Astudillo
SyDa
59
6
0
21 Oct 2023
Teaching Language Models to Self-Improve through Interactive
  Demonstrations
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRMReLM
104
22
0
20 Oct 2023
The Opaque Law of Artificial Intelligence
The Opaque Law of Artificial Intelligence
Vincenzo Calderonio
AILaw
90
1
0
19 Oct 2023
Vision-Language Models are Zero-Shot Reward Models for Reinforcement
  Learning
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde
Victoriano Montesinos
Elvis Nava
Ethan Perez
David Lindner
VLM
101
92
0
19 Oct 2023
GestureGPT: Toward Zero-shot Interactive Gesture Understanding and
  Grounding with Large Language Model Agents
GestureGPT: Toward Zero-shot Interactive Gesture Understanding and Grounding with Large Language Model Agents
Xin Zeng
Xiaoyu Wang
Tengxiang Zhang
Chun Yu
Shengdong Zhao
Yiqiang Chen
LLMAGLM&RoSLR
84
3
0
19 Oct 2023
GraphGPT: Graph Instruction Tuning for Large Language Models
GraphGPT: Graph Instruction Tuning for Large Language Models
Jiabin Tang
Yuhao Yang
Wei Wei
Lei Shi
Lixin Su
Suqi Cheng
Dawei Yin
Chao Huang
153
148
0
19 Oct 2023
Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy
  Searcher
Know Where to Go: Make LLM a Relevant, Responsible, and Trustworthy Searcher
Xiang Shi
Jiawei Liu
Yinpeng Liu
Qikai Cheng
Wei Lu
RALMHILMKELM
73
6
0
19 Oct 2023
Getting aligned on representational alignment
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
158
91
0
18 Oct 2023
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Heng-Chiao Huang
Jiuxiang Gu
Dinesh Manocha
152
24
0
18 Oct 2023
Group Preference Optimization: Few-Shot Alignment of Large Language
  Models
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao
John Dang
Aditya Grover
81
30
0
17 Oct 2023
Compositional preference models for aligning LMs
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
93
20
0
17 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
135
79
0
16 Oct 2023
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications
  with Programmable Rails
NeMo Guardrails: A Toolkit for Controllable and Safe LLM Applications with Programmable Rails
Traian Rebedea
R. Dinu
Makesh Narsimhan Sreedhar
Christopher Parisien
Jonathan Cohen
KELM
99
152
0
16 Oct 2023
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake
  Analysis
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Kai Chen
Chunwei Wang
Kuo Yang
Jianhua Han
Lanqing Hong
...
Zhenguo Li
Dit-Yan Yeung
Lifeng Shang
Xin Jiang
Qun Liu
183
36
0
16 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future
  Directions
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
116
44
0
16 Oct 2023
Let's reward step by step: Step-Level reward model as the Navigators for
  Reasoning
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning
Qianli Ma
Haotian Zhou
Tingkai Liu
Jianbo Yuan
Pengfei Liu
Yang You
Hongxia Yang
LRM
92
54
0
16 Oct 2023
Prompt Packer: Deceiving LLMs through Compositional Instruction with
  Hidden Attacks
Prompt Packer: Deceiving LLMs through Compositional Instruction with Hidden Attacks
Shuyu Jiang
Xingshu Chen
Rui Tang
93
25
0
16 Oct 2023
Verbosity Bias in Preference Labeling by Large Language Models
Verbosity Bias in Preference Labeling by Large Language Models
Keita Saito
Akifumi Wachi
Koki Wataoka
Youhei Akimoto
ALM
100
38
0
16 Oct 2023
CarExpert: Leveraging Large Language Models for In-Car Conversational
  Question Answering
CarExpert: Leveraging Large Language Models for In-Car Conversational Question Answering
Md. Rony
Christian Suess
Sinchana Ramakanth Bhat
Viju Sudhi
Julia Schneider
Maximilian Vogel
Roman Teucher
Ken E. Friedl
S. Sahoo
71
11
0
14 Oct 2023
Large Language Model Unlearning
Large Language Model Unlearning
Yuanshun Yao
Xiaojun Xu
Yang Liu
MU
137
148
0
14 Oct 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
72
11
0
13 Oct 2023
Welfare Diplomacy: Benchmarking Language Model Cooperation
Welfare Diplomacy: Benchmarking Language Model Cooperation
Gabriel Mukobi
Hannah Erlebach
Niklas Lauffer
Lewis Hammond
Alan Chan
Jesse Clifton
LM&Ro
92
27
0
13 Oct 2023
Exploration with Principles for Diverse AI Supervision
Exploration with Principles for Diverse AI Supervision
Hao Liu
Matei A. Zaharia
Pieter Abbeel
94
2
0
13 Oct 2023
HoneyBee: Progressive Instruction Finetuning of Large Language Models
  for Materials Science
HoneyBee: Progressive Instruction Finetuning of Large Language Models for Materials Science
Yu Song
Santiago Miret
Huan Zhang
Bang Liu
ALM
55
24
0
12 Oct 2023
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative
  Writing
A Confederacy of Models: a Comprehensive Evaluation of LLMs on Creative Writing
Carlos Gómez-Rodríguez
Paul Williams
88
82
0
12 Oct 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
165
710
0
12 Oct 2023
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems
  via Knowledge Enhancement and Alignment
Improving Factual Consistency for Knowledge-Grounded Dialogue Systems via Knowledge Enhancement and Alignment
Boyang Xue
Weichao Wang
Hongru Wang
Fei Mi
Rui Wang
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
KELMHILM
299
18
0
12 Oct 2023
Understanding and Controlling a Maze-Solving Policy Network
Understanding and Controlling a Maze-Solving Policy Network
Ulisse Mini
Peli Grietzer
Mrinank Sharma
Austin Meek
M. MacDiarmid
Alexander Matt Turner
51
18
0
12 Oct 2023
Receive, Reason, and React: Drive as You Say with Large Language Models
  in Autonomous Vehicles
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Ziran Wang
94
91
0
12 Oct 2023
Beyond Memorization: Violating Privacy Via Inference with Large Language
  Models
Beyond Memorization: Violating Privacy Via Inference with Large Language Models
Robin Staab
Mark Vero
Mislav Balunović
Martin Vechev
PILM
75
94
0
11 Oct 2023
Case Law Grounding: Aligning Judgments of Humans and AI on
  Socially-Constructed Concepts
Case Law Grounding: Aligning Judgments of Humans and AI on Socially-Constructed Concepts
Quan Ze Chen
Amy X. Zhang
ELM
128
6
0
10 Oct 2023
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang
Samyak Gupta
Mengzhou Xia
Kai Li
Danqi Chen
AAML
84
312
0
10 Oct 2023
Violation of Expectation via Metacognitive Prompting Reduces Theory of
  Mind Prediction Error in Large Language Models
Violation of Expectation via Metacognitive Prompting Reduces Theory of Mind Prediction Error in Large Language Models
Courtland Leer
Vincent Trost
Vineeth Voruganti
51
4
0
10 Oct 2023
Exploring Memorization in Fine-tuned Language Models
Exploring Memorization in Fine-tuned Language Models
Shenglai Zeng
Yaxin Li
Jie Ren
Yiding Liu
Han Xu
Pengfei He
Yue Xing
Shuaiqiang Wang
Jiliang Tang
Dawei Yin
PILM
100
26
0
10 Oct 2023
Constructive Large Language Models Alignment with Diverse Feedback
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
104
9
0
10 Oct 2023
Jailbreak and Guard Aligned Language Models with Only Few In-Context
  Demonstrations
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Zeming Wei
Yifei Wang
Ang Li
Yichuan Mo
Yisen Wang
125
279
0
10 Oct 2023
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Chau Pham
Boyi Liu
Yingxiang Yang
Zhengyu Chen
Tianyi Liu
Jianbo Yuan
Bryan A. Plummer
Zhaoran Wang
Hongxia Yang
LLMAG
102
19
0
10 Oct 2023
Factual and Personalized Recommendations using Language Models and
  Reinforcement Learning
Factual and Personalized Recommendations using Language Models and Reinforcement Learning
Jihwan Jeong
Yinlam Chow
Guy Tennenholtz
Chih-Wei Hsu
Azamat Tulepbergenov
Mohammad Ghavamzadeh
Craig Boutilier
88
4
0
09 Oct 2023
SALMON: Self-Alignment with Instructable Reward Models
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun
Songlin Yang
Hongxin Zhang
Qinhong Zhou
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
ALMSyDa
127
41
0
09 Oct 2023
STREAM: Social data and knowledge collective intelligence platform for
  TRaining Ethical AI Models
STREAM: Social data and knowledge collective intelligence platform for TRaining Ethical AI Models
Yuwei Wang
Enmeng Lu
Zizhe Ruan
Yao Liang
Yi Zeng
AI4TS
79
4
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELMALM
112
91
0
09 Oct 2023
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona
  Biases in Dialogue Systems
Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems
Yixin Wan
Jieyu Zhao
Aman Chadha
Nanyun Peng
Kai-Wei Chang
115
27
0
08 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
  from Human Feedback
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
116
52
0
08 Oct 2023
Towards Better Chain-of-Thought Prompting Strategies: A Survey
Towards Better Chain-of-Thought Prompting Strategies: A Survey
Zihan Yu
Liang He
Zhen Wu
Xinyu Dai
Jiajun Chen
LRM
167
55
0
08 Oct 2023
Critique Ability of Large Language Models
Critique Ability of Large Language Models
Liangchen Luo
Zi Lin
Yinxiao Liu
Lei Shu
Yun Zhu
Jingbo Shang
Lei Meng
AI4MHLRMELM
62
16
0
07 Oct 2023
Fine-tuning Aligned Language Models Compromises Safety, Even When Users
  Do Not Intend To!
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Xiangyu Qi
Yi Zeng
Tinghao Xie
Pin-Yu Chen
Ruoxi Jia
Prateek Mittal
Peter Henderson
SILM
158
635
0
05 Oct 2023
Previous
123...181920...232425
Next