ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDaMoMe
ArXiv (abs)PDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,202 papers shown
Title
Aligning Language Models to Explicitly Handle Ambiguity
Aligning Language Models to Explicitly Handle Ambiguity
Sungmin Cho
Youna Kim
Cheonbok Park
Junyeob Kim
Choonghyun Park
Kang Min Yoo
Sang-goo Lee
Taeuk Kim
104
22
0
18 Apr 2024
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents
Liyan Tang
Philippe Laban
Greg Durrett
HILMSyDa
86
103
0
16 Apr 2024
Crossing the principle-practice gap in AI ethics with ethical
  problem-solving
Crossing the principle-practice gap in AI ethics with ethical problem-solving
N. Corrêa
James William Santos
Camila Galvão
Marcelo Pasetti
Dieine Schiavon
Faizah Naqvi
Robayet Hossain
N. D. Oliveira
81
5
0
16 Apr 2024
Self-Supervised Visual Preference Alignment
Self-Supervised Visual Preference Alignment
Ke Zhu
Liang Zhao
Zheng Ge
Xiangyu Zhang
75
17
0
16 Apr 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human
  Feedback
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
118
40
0
16 Apr 2024
Improving the Capabilities of Large Language Model Based Marketing
  Analytics Copilots With Semantic Search And Fine-Tuning
Improving the Capabilities of Large Language Model Based Marketing Analytics Copilots With Semantic Search And Fine-Tuning
Yilin Gao
Arava Sai Kumar
Yancheng Li
James W. Snyder
AI4MH
104
2
0
16 Apr 2024
Reinforcement Learning from Multi-role Debates as Feedback for Bias
  Mitigation in LLMs
Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs
Ruoxi Cheng
Haoxuan Ma
Shuirong Cao
Jiaqi Li
Aihua Pei
Zhiqiang Wang
Pengliang Ji
Haoyu Wang
Jiaqi Huo
AI4CE
90
9
0
15 Apr 2024
LLM Evaluators Recognize and Favor Their Own Generations
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
122
197
0
15 Apr 2024
Impact of Preference Noise on the Alignment Performance of Generative
  Language Models
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
102
21
0
15 Apr 2024
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in
  Large Language Models
When Hindsight is Not 20/20: Testing Limits on Reflective Thinking in Large Language Models
Yanhong Li
Chenghao Yang
Allyson Ettinger
ReLMLRMLLMAG
84
11
0
14 Apr 2024
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large
  Language Models for Behavioral Simulation
Do LLMs Play Dice? Exploring Probability Distribution Sampling in Large Language Models for Behavioral Simulation
Jia Gu
Liang Pang
Huawei Shen
Xueqi Cheng
81
6
0
13 Apr 2024
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Junchi Wang
Lei Ke
MLLMLRMVLM
81
29
0
12 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
91
38
0
12 Apr 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
117
24
0
12 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDaEgoV
126
96
0
11 Apr 2024
Interactive Prompt Debugging with Sequence Salience
Interactive Prompt Debugging with Sequence Salience
Ian Tenney
Ryan Mullins
Bin Du
Shree Pandya
Minsuk Kahng
Lucas Dixon
LRM
67
2
0
11 Apr 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
206
6
0
11 Apr 2024
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of
  Generative Agents
Frontier AI Ethics: Anticipating and Evaluating the Societal Impacts of Generative Agents
Seth Lazar
SILM
67
1
0
10 Apr 2024
Rethinking How to Evaluate Language Model Jailbreak
Rethinking How to Evaluate Language Model Jailbreak
Hongyu Cai
Arjun Arunasalam
Leo Y. Lin
Antonio Bianchi
Z. Berkay Celik
ALM
65
8
0
09 Apr 2024
Cendol: Open Instruction-tuned Generative Large Language Models for
  Indonesian Languages
Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages
Samuel Cahyawijaya
Holy Lovenia
Fajri Koto
Rifki Afina Putri
Emmanuel Dave
...
Bryan Wilie
Genta Indra Winata
Alham Fikri Aji
Ayu Purwarianti
Pascale Fung
133
18
0
09 Apr 2024
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM
  Experts
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Experts
Shaona Ghosh
Prasoon Varshney
Erick Galinkin
Christopher Parisien
ELM
93
52
0
09 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELMKELM
167
39
0
08 Apr 2024
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
FGAIF: Aligning Large Vision-Language Models with Fine-grained AI Feedback
Liqiang Jing
Xinya Du
185
18
0
07 Apr 2024
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State
  Transition Dynamics
PoLLMgraph: Unraveling Hallucinations in Large Language Models via State Transition Dynamics
Derui Zhu
Dingfan Chen
Qing Li
Zongxiong Chen
Lei Ma
Jens Grossklags
Mario Fritz
HILM
89
14
0
06 Apr 2024
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context
  and Dynamics of Human Interactions Within Social Groups
JRDB-Social: A Multifaceted Robotic Dataset for Understanding of Context and Dynamics of Human Interactions Within Social Groups
Simindokht Jahangard
Zhixi Cai
Shiki Wen
Hamid Rezatofighi
56
6
0
06 Apr 2024
Social Skill Training with Large Language Models
Social Skill Training with Large Language Models
Diyi Yang
Caleb Ziems
William B. Held
Omar Shaikh
Michael S. Bernstein
John C. Mitchell
LLMAG
80
11
0
05 Apr 2024
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data
Jingyu Zhang
Marc Marone
Tianjian Li
Benjamin Van Durme
Daniel Khashabi
193
9
0
05 Apr 2024
Direct Nash Optimization: Teaching Language Models to Self-Improve with
  General Preferences
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
202
132
0
04 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
86
25
0
04 Apr 2024
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness
  of LLMs as Rankers
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
Yuan Wang
Xuyang Wu
Hsin-Tai Wu
Zhiqiang Tao
Yi Fang
ALM
78
10
0
04 Apr 2024
Designing for Human-Agent Alignment: Understanding what humans want from
  their agents
Designing for Human-Agent Alignment: Understanding what humans want from their agents
Nitesh Goyal
Minsuk Chang
Michael Terry
56
16
0
04 Apr 2024
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
  with a Self-Critique Pipeline
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline
Yifan Xu
Xiao Liu
Xinghan Liu
Zhenyu Hou
Yueyan Li
...
Aohan Zeng
Zhengxiao Du
Wenyi Zhao
Jie Tang
Yuxiao Dong
LRM
99
42
0
03 Apr 2024
Calibrating the Confidence of Large Language Models by Eliciting
  Fidelity
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
Mozhi Zhang
Mianqiu Huang
Rundong Shi
Linsen Guo
Chong Peng
Peng Yan
Yaqian Zhou
Xipeng Qiu
86
13
0
03 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
Deconstructing In-Context Learning: Understanding Prompts via Corruption
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
89
3
0
02 Apr 2024
Risks from Language Models for Automated Mental Healthcare: Ethics and
  Structure for Implementation
Risks from Language Models for Automated Mental Healthcare: Ethics and Structure for Implementation
Declan Grabb
Max Lamparth
N. Vasan
89
17
0
02 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from
  Language Model Reward
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
131
78
0
01 Apr 2024
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and
  Mitigation
Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation
Yixin Wan
Arjun Subramonian
Anaelia Ovalle
Zongyu Lin
Ashima Suvarna
Christina Chance
Hritik Bansal
Rebecca Pattichis
Kai-Wei Chang
EGVM
166
36
0
01 Apr 2024
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human
  Feedback
ChatGLM-RLHF: Practices of Aligning Large Language Models with Human Feedback
Zhenyu Hou
Yiin Niu
Zhengxiao Du
Xiaohan Zhang
Xiao Liu
...
Qinkai Zheng
Minlie Huang
Hongning Wang
Jie Tang
Yuxiao Dong
ALM
107
19
0
01 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
155
11
0
31 Mar 2024
Algorithmic Collusion by Large Language Models
Algorithmic Collusion by Large Language Models
Sara Fish
Yannai A. Gonczarowski
Ran I. Shorrer
140
13
0
31 Mar 2024
Configurable Safety Tuning of Language Models with Synthetic Preference
  Data
Configurable Safety Tuning of Language Models with Synthetic Preference Data
Víctor Gallego
67
7
0
30 Mar 2024
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge
  Editing Benchmark
Is Factuality Decoding a Free Lunch for LLMs? Evaluation on Knowledge Editing Benchmark
Baolong Bi
Shenghua Liu
Yiwei Wang
Lingrui Mei
Xueqi Cheng
KELM
51
13
0
30 Mar 2024
Fine-Tuning Language Models with Reward Learning on Policy
Fine-Tuning Language Models with Reward Learning on Policy
Hao Lang
Fei Huang
Yongbin Li
ALM
65
7
0
28 Mar 2024
sDPO: Don't Use Your Data All at Once
sDPO: Don't Use Your Data All at Once
Dahyun Kim
Yungi Kim
Wonho Song
Hyeonwoo Kim
Yunsu Kim
Sanghoon Kim
Chanjun Park
76
35
0
28 Mar 2024
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
STaR-GATE: Teaching Language Models to Ask Clarifying Questions
Chinmaya Andukuri
Jan-Philipp Fränken
Tobias Gerstenberg
Noah D. Goodman
SyDaLRM
102
44
0
28 Mar 2024
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Learning From Correctness Without Prompting Makes LLM Efficient Reasoner
Yuxuan Yao
Han Wu
Zhijiang Guo
Biyan Zhou
Jiahui Gao
Sichun Luo
Hanxu Hou
Xiaojin Fu
Linqi Song
LLMAGLRM
125
10
0
28 Mar 2024
Understanding the Learning Dynamics of Alignment with Human Feedback
Understanding the Learning Dynamics of Alignment with Human Feedback
Shawn Im
Yixuan Li
ALM
107
14
0
27 Mar 2024
IterAlign: Iterative Constitutional Alignment of Large Language Models
IterAlign: Iterative Constitutional Alignment of Large Language Models
Xiusi Chen
Hongzhi Wen
Sreyashi Nag
Chen Luo
Qingyu Yin
Ruirui Li
Zheng Li
Wei Wang
AILaw
36
6
0
27 Mar 2024
Dual Instruction Tuning with Large Language Models for Mathematical
  Reasoning
Dual Instruction Tuning with Large Language Models for Mathematical Reasoning
Yongwei Zhou
Tiejun Zhao
LRM
85
7
0
27 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of
  Large Language Models
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
94
53
0
26 Mar 2024
Previous
123...121314...232425
Next