ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDa
    MoMe
ArXivPDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,123 papers shown
Title
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
34
58
0
13 Dec 2023
LDM$^2$: A Large Decision Model Imitating Human Cognition with Dynamic
  Memory Enhancement
LDM2^22: A Large Decision Model Imitating Human Cognition with Dynamic Memory Enhancement
Xingjin Wang
Linjing Li
D. Zeng
38
0
0
13 Dec 2023
On Diversified Preferences of Large Language Model Alignment
On Diversified Preferences of Large Language Model Alignment
Dun Zeng
Yong Dai
Pengyu Cheng
Longyue Wang
Tianhao Hu
Wanshun Chen
Nan Du
Zenglin Xu
ALM
38
16
0
12 Dec 2023
Alignment for Honesty
Alignment for Honesty
Yuqing Yang
Ethan Chern
Xipeng Qiu
Graham Neubig
Pengfei Liu
44
32
0
12 Dec 2023
Exchange-of-Thought: Enhancing Large Language Model Capabilities through
  Cross-Model Communication
Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
Zhangyue Yin
Qiushi Sun
Cheng Chang
Qipeng Guo
Junqi Dai
Xuanjing Huang
Xipeng Qiu
LRM
56
50
0
04 Dec 2023
Tackling Bias in Pre-trained Language Models: Current Trends and
  Under-represented Societies
Tackling Bias in Pre-trained Language Models: Current Trends and Under-represented Societies
Vithya Yogarajan
Gillian Dobbie
Te Taka Keegan
R. Neuwirth
ALM
54
11
0
03 Dec 2023
Axiomatic Preference Modeling for Longform Question Answering
Axiomatic Preference Modeling for Longform Question Answering
Corby Rosset
Guoqing Zheng
Victor C. Dibia
Ahmed Hassan Awadallah
Paul Bennett
SyDa
32
3
0
02 Dec 2023
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from
  Fine-grained Correctional Human Feedback
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
M. Steyvers
Yuan Yao
Haoye Zhang
Taiwen He
Yifeng Han
...
Xinyue Hu
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
Tat-Seng Chua
MLLM
VLM
150
182
0
01 Dec 2023
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific
  Progress in NLP
The Case for Scalable, Data-Driven Theory: A Paradigm for Scientific Progress in NLP
Julian Michael
19
1
0
01 Dec 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELM
LM&MA
ALM
125
43
0
30 Nov 2023
CritiqueLLM: Towards an Informative Critique Generation Model for
  Evaluation of Large Language Model Generation
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Pei Ke
Bosi Wen
Andrew Feng
Xiao-Yang Liu
Xuanyu Lei
...
Aohan Zeng
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
ELM
ALM
50
25
0
30 Nov 2023
Foundational Moral Values for AI Alignment
Foundational Moral Values for AI Alignment
Betty Hou
Brian Patrick Green
35
0
0
28 Nov 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models
  Catching up?
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Chenyu You
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems
A Survey of the Evolution of Language Model-Based Dialogue Systems
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
72
21
0
28 Nov 2023
DUnE: Dataset for Unified Editing
DUnE: Dataset for Unified Editing
Afra Feyza Akyürek
Eric Pan
Garry Kuwanto
Derry Wijaya
KELM
32
17
0
27 Nov 2023
Exploring the Robustness of Model-Graded Evaluations and Automated
  Interpretability
Exploring the Robustness of Model-Graded Evaluations and Automated Interpretability
Simon Lermen
Ondvrej Kvapil
ELM
AAML
23
3
0
26 Nov 2023
Large Language Models as Automated Aligners for benchmarking
  Vision-Language Models
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
Yuanfeng Ji
Chongjian Ge
Weikai Kong
Enze Xie
Zhengying Liu
Zhengguo Li
Ping Luo
MLLM
ELM
42
7
0
24 Nov 2023
Data-Efficient Alignment of Large Language Models with Human Feedback
  Through Natural Language
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Di Jin
Shikib Mehri
Devamanyu Hazarika
Aishwarya Padmakumar
Sungjin Lee
Yang Liu
Mahdi Namazifar
ALM
26
16
0
24 Nov 2023
Scalable AI Safety via Doubly-Efficient Debate
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen
Geoffrey Irving
Georgios Piliouras
32
15
0
23 Nov 2023
Data Diversity Matters for Robust Instruction Tuning
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
84
36
0
21 Nov 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Chenyu You
Nikhil Naik
EGVM
50
229
0
21 Nov 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELM
AI4CE
LRM
ALM
LM&Ro
63
15
0
20 Nov 2023
FinanceBench: A New Benchmark for Financial Question Answering
FinanceBench: A New Benchmark for Financial Question Answering
Pranab Islam
Anand Kannappan
Douwe Kiela
Rebecca Qian
Nino Scherrer
Bertie Vidgen
RALM
31
72
0
20 Nov 2023
System 2 Attention (is something you might need too)
System 2 Attention (is something you might need too)
Jason Weston
Sainbayar Sukhbaatar
RALM
OffRL
LRM
35
58
0
20 Nov 2023
Case Repositories: Towards Case-Based Reasoning for AI Alignment
Case Repositories: Towards Case-Based Reasoning for AI Alignment
K. J. Kevin Feng
Quan Ze Chen
Inyoung Cheong
King Xia
Amy X. Zhang
32
10
0
18 Nov 2023
Examining LLMs' Uncertainty Expression Towards Questions Outside
  Parametric Knowledge
Examining LLMs' Uncertainty Expression Towards Questions Outside Parametric Knowledge
Genglin Liu
Xingyao Wang
Lifan Yuan
Yangyi Chen
Hao Peng
34
16
0
16 Nov 2023
Trustworthy Large Models in Vision: A Survey
Trustworthy Large Models in Vision: A Survey
Ziyan Guo
Li Xu
Jun Liu
MU
66
0
0
16 Nov 2023
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
29
13
0
16 Nov 2023
JAB: Joint Adversarial Prompting and Belief Augmentation
JAB: Joint Adversarial Prompting and Belief Augmentation
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Jwala Dhamala
Shalini Ghosh
Richard Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
AAML
41
7
0
16 Nov 2023
Stealthy and Persistent Unalignment on Large Language Models via
  Backdoor Injections
Stealthy and Persistent Unalignment on Large Language Models via Backdoor Injections
Yuanpu Cao
Bochuan Cao
Jinghui Chen
34
24
0
15 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious
  Demonstrations Shows their Vulnerabilities
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
34
27
0
15 Nov 2023
When does In-context Learning Fall Short and Why? A Study on
  Specification-Heavy Tasks
When does In-context Learning Fall Short and Why? A Study on Specification-Heavy Tasks
Hao Peng
Xiaozhi Wang
Jianhui Chen
Weikai Li
Y. Qi
...
Zhili Wu
Kaisheng Zeng
Bin Xu
Lei Hou
Juanzi Li
39
28
0
15 Nov 2023
An Empathetic User-Centric Chatbot for Emotional Support
An Empathetic User-Centric Chatbot for Emotional Support
Yanting Pan
Yixuan Tang
Yuchen Niu
26
3
0
15 Nov 2023
Value FULCRA: Mapping Large Language Models to the Multidimensional
  Spectrum of Basic Human Values
Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values
Jing Yao
Xiaoyuan Yi
Xiting Wang
Yifan Gong
Xing Xie
46
23
0
15 Nov 2023
Are You Sure? Challenging LLMs Leads to Performance Drops in The
  FlipFlop Experiment
Are You Sure? Challenging LLMs Leads to Performance Drops in The FlipFlop Experiment
Philippe Laban
Lidiya Murakhovs'ka
Caiming Xiong
Chien-Sheng Wu
LRM
26
19
0
14 Nov 2023
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New
  LLM-powered Applications
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Bhaktipriya Radharapu
Kevin Robinson
Lora Aroyo
Preethi Lahoti
36
37
0
14 Nov 2023
LLMs cannot find reasoning errors, but can correct them given the error
  location
LLMs cannot find reasoning errors, but can correct them given the error location
Gladys Tyen
Hassan Mansoor
Victor Carbune
Peter Chen
Tony Mak
LRM
19
77
0
14 Nov 2023
Functionality learning through specification instructions
Functionality learning through specification instructions
Pedro Henrique Luz de Araujo
Benjamin Roth
ELM
41
0
0
14 Nov 2023
Extrinsically-Focused Evaluation of Omissions in Medical Summarization
Extrinsically-Focused Evaluation of Omissions in Medical Summarization
Elliot Schumacher
Daniel Rosenthal
Varun Nair
Luladay Price
Geoffrey Tso
Anitha Kannan
19
2
0
14 Nov 2023
A Closer Look at the Self-Verification Abilities of Large Language
  Models in Logical Reasoning
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning
Ruixin Hong
Hongming Zhang
Xinyu Pang
Dong Yu
Changshui Zhang
LRM
52
24
0
14 Nov 2023
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Suyu Ge
Chunting Zhou
Rui Hou
Madian Khabsa
Yi-Chia Wang
Qifan Wang
Jiawei Han
Yuning Mao
AAML
LRM
44
93
0
13 Nov 2023
A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question
  Decomposition with Large Language Models
A Step Closer to Comprehensive Answers: Constrained Multi-Stage Question Decomposition with Large Language Models
He Cao
Zhenwei An
Jiazhan Feng
Kun Xu
Liwei Chen
Dongyan Zhao
HILM
33
2
0
13 Nov 2023
Past as a Guide: Leveraging Retrospective Learning for Python Code
  Completion
Past as a Guide: Leveraging Retrospective Learning for Python Code Completion
Seunggyoon Shin
Seunggyu Chang
Sungjoon Choi
KELM
40
1
0
13 Nov 2023
Flames: Benchmarking Value Alignment of LLMs in Chinese
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang
Xiangyang Liu
Qianyu Guo
Tianxiang Sun
Jiawei Sun
...
Yixu Wang
Yan Teng
Xipeng Qiu
Yingchun Wang
Dahua Lin
ALM
35
10
0
12 Nov 2023
Fake Alignment: Are LLMs Really Aligned Well?
Fake Alignment: Are LLMs Really Aligned Well?
Yixu Wang
Yan Teng
Kexin Huang
Chengqi Lyu
Songyang Zhang
Wenwei Zhang
Xingjun Ma
Yu-Gang Jiang
Yu Qiao
Yingchun Wang
43
16
0
10 Nov 2023
Hallucination-minimized Data-to-answer Framework for Financial
  Decision-makers
Hallucination-minimized Data-to-answer Framework for Financial Decision-makers
Sohini Roychowdhury
Andres Alvarez
Brian Moore
Marko Krema
Maria Paz Gelpi
...
Angel Rodriguez
Jose Ramon Cabrejas
Pablo Martinez Serrano
Punit Agrawal
Arijit Mukherjee
44
8
0
09 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
29
79
0
07 Nov 2023
Can LLMs Follow Simple Rules?
Can LLMs Follow Simple Rules?
Norman Mu
Sarah Chen
Zifan Wang
Sizhe Chen
David Karamardian
Lulwa Aljeraisy
Basel Alomair
Dan Hendrycks
David Wagner
ALM
31
27
0
06 Nov 2023
LLMs grasp morality in concept
LLMs grasp morality in concept
Mark Pock
Andre Ye
Jared Moore
FaML
31
2
0
04 Nov 2023
Conditions on Preference Relations that Guarantee the Existence of
  Optimal Policies
Conditions on Preference Relations that Guarantee the Existence of Optimal Policies
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
31
2
0
03 Nov 2023
Previous
123...151617...212223
Next