ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDa
    MoMe
ArXivPDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,106 papers shown
Title
Margin Matching Preference Optimization: Enhanced Model Alignment with
  Granular Feedback
Margin Matching Preference Optimization: Enhanced Model Alignment with Granular Feedback
Kyuyoung Kim
Ah Jeong Seo
Hao Liu
Jinwoo Shin
Kimin Lee
24
2
0
04 Oct 2024
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Justice or Prejudice? Quantifying Biases in LLM-as-a-Judge
Jiayi Ye
Yanbo Wang
Yue Huang
Dongping Chen
Qihui Zhang
...
Werner Geyer
Chao Huang
Pin-Yu Chen
Nitesh V. Chawla
Xiangliang Zhang
ELM
37
45
0
03 Oct 2024
CodePMP: Scalable Preference Model Pretraining for Large Language Model
  Reasoning
CodePMP: Scalable Preference Model Pretraining for Large Language Model Reasoning
Huimu Yu
Xing Wu
Weidong Yin
Debing Zhang
Songlin Hu
LRM
31
5
0
03 Oct 2024
Generative Reward Models
Generative Reward Models
Dakota Mahan
Duy Phung
Rafael Rafailov
Chase Blagden
Nathan Lile
Louis Castricato
Jan-Philipp Fränken
Chelsea Finn
Alon Albalak
VLM
SyDa
OffRL
27
26
0
02 Oct 2024
FactAlign: Long-form Factuality Alignment of Large Language Models
FactAlign: Long-form Factuality Alignment of Large Language Models
Chao-Wei Huang
Yun-Nung Chen
HILM
25
2
0
02 Oct 2024
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Angela Lopez-Cardona
Carlos Segura
Alexandros Karatzoglou
Sergi Abadal
Ioannis Arapakis
ALM
54
2
0
02 Oct 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
45
1
0
02 Oct 2024
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
TypedThinker: Diversify Large Language Model Reasoning with Typed Thinking
Danqing Wang
Jianxin Ma
Fei Fang
Lei Li
LLMAG
LRM
154
0
0
02 Oct 2024
Unsupervised Human Preference Learning
Unsupervised Human Preference Learning
Sumuk Shashidhar
Abhinav Chinta
Vaibhav Sahai
Dilek Hakkani Tur
LRM
35
0
0
30 Sep 2024
'Simulacrum of Stories': Examining Large Language Models as Qualitative
  Research Participants
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants
Shivani Kapania
William Agnew
Motahhare Eslami
Hoda Heidari
Sarah E Fox
42
4
0
28 Sep 2024
Multimodal Pragmatic Jailbreak on Text-to-image Models
Multimodal Pragmatic Jailbreak on Text-to-image Models
Tong Liu
Zhixin Lai
Gengyuan Zhang
Philip H. S. Torr
Vera Demberg
Volker Tresp
Jindong Gu
35
4
0
27 Sep 2024
LLMs4Synthesis: Leveraging Large Language Models for Scientific
  Synthesis
LLMs4Synthesis: Leveraging Large Language Models for Scientific Synthesis
Hamed Babaei Giglou
Jennifer D'Souza
Sören Auer
23
3
0
27 Sep 2024
AI Policy Projector: Grounding LLM Policy Design in Iterative Mapmaking
AI Policy Projector: Grounding LLM Policy Design in Iterative Mapmaking
Michelle S. Lam
Fred Hohman
Dominik Moritz
Jeffrey P. Bigham
Kenneth Holstein
Mary Beth Kery
35
1
0
26 Sep 2024
Inference-Time Language Model Alignment via Integrated Value Guidance
Inference-Time Language Model Alignment via Integrated Value Guidance
Zhixuan Liu
Zhanhui Zhou
Yuanfu Wang
Chao Yang
Yu Qiao
35
7
0
26 Sep 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard
  for Prompt Attacks
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
Giandomenico Cornacchia
Giulio Zizzo
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Mark Purcell
26
1
0
26 Sep 2024
Modulated Intervention Preference Optimization (MIPO): Keep the Easy,
  Refine the Difficult
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult
Cheolhun Jang
28
0
0
26 Sep 2024
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
Rimvydas Rubavicius
Peter David Fagan
A. Lascarides
Subramanian Ramamoorthy
LM&Ro
139
0
0
26 Sep 2024
Data-Centric AI Governance: Addressing the Limitations of Model-Focused
  Policies
Data-Centric AI Governance: Addressing the Limitations of Model-Focused Policies
Ritwik Gupta
Leah Walker
Rodolfo Corona
Stephanie Fu
Suzanne Petryk
Janet Napolitano
Trevor Darrell
Andrew W. Reddie
ELM
40
3
0
25 Sep 2024
Enhancing Guardrails for Safe and Secure Healthcare AI
Enhancing Guardrails for Safe and Secure Healthcare AI
Ananya Gangavarapu
21
0
0
25 Sep 2024
Archon: An Architecture Search Framework for Inference-Time Techniques
Archon: An Architecture Search Framework for Inference-Time Techniques
Jon Saad-Falcon
Adrian Gamarra Lafuente
Shlok Natarajan
Nahum Maru
Hristo Todorov
...
E. Kelly Buchanan
Mayee Chen
Neel Guha
Christopher Ré
Azalia Mirhoseini
AI4CE
33
18
0
23 Sep 2024
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme
  Classification
MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
Siddhant Bikram Shah
Shuvam Shiwakoti
Maheep Chaudhary
Haohan Wang
VLM
25
9
0
23 Sep 2024
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
PROMPTFUZZ: Harnessing Fuzzing Techniques for Robust Testing of Prompt Injection in LLMs
Jiahao Yu
Yangguang Shao
Hanwen Miao
Junzheng Shi
SILM
AAML
69
4
0
23 Sep 2024
Backtracking Improves Generation Safety
Backtracking Improves Generation Safety
Yiming Zhang
Jianfeng Chi
Hailey Nguyen
Kartikeya Upasani
Daniel M. Bikel
Jason Weston
Eric Michael Smith
SILM
48
7
0
22 Sep 2024
RRM: Robust Reward Model Training Mitigates Reward Hacking
RRM: Robust Reward Model Training Mitigates Reward Hacking
Tianqi Liu
Wei Xiong
Jie Jessie Ren
Lichang Chen
Junru Wu
...
Yuan Liu
Bilal Piot
Abe Ittycheriah
Aviral Kumar
Mohammad Saleh
AAML
56
13
0
20 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
49
1
0
19 Sep 2024
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs
  Fine-tuning
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning
Essa Jan
Nouar Aldahoul
Moiz Ali
Faizan Ahmad
Fareed Zaffar
Yasir Zaki
31
3
0
18 Sep 2024
CoCA: Regaining Safety-awareness of Multimodal Large Language Models
  with Constitutional Calibration
CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration
Jiahui Gao
Renjie Pi
Tianyang Han
Han Wu
Lanqing Hong
Lingpeng Kong
Xin Jiang
Zhenguo Li
41
5
0
17 Sep 2024
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation
  with LLMs
Model-in-the-Loop (MILO): Accelerating Multimodal AI Data Annotation with LLMs
Yifan Wang
David Stevens
Pranay Shah
Wenwen Jiang
Miao Liu
...
Boying Gong
Daniel Lee
Jiabo Hu
Ning Zhang
Bob Kamma
40
1
0
16 Sep 2024
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen
Tiffany Knearem
Reshmi Ghosh
Yu-Ju Yang
Tanushree Mitra
Yun Huang
Yun Huang
61
0
0
15 Sep 2024
Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and
  Collaborative Policymaking
Policy Prototyping for LLMs: Pluralistic Alignment via Interactive and Collaborative Policymaking
K. J. Kevin Feng
Inyoung Cheong
Quan Ze Chen
Amy X. Zhang
44
2
0
13 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAML
VLM
43
3
0
11 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
45
12
0
11 Sep 2024
Semi-Supervised Reward Modeling via Iterative Self-Training
Semi-Supervised Reward Modeling via Iterative Self-Training
Yifei He
Haoxiang Wang
Ziyan Jiang
Alexandros Papangelis
Han Zhao
OffRL
44
2
0
10 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
From Calculation to Adjudication: Examining LLM judges on Mathematical Reasoning Tasks
Andreas Stephan
D. Zhu
Matthias Aßenmacher
Xiaoyu Shen
Benjamin Roth
ELM
47
4
0
06 Sep 2024
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
105
13
0
06 Sep 2024
E2CL: Exploration-based Error Correction Learning for Embodied Agents
E2CL: Exploration-based Error Correction Learning for Embodied Agents
Hanlin Wang
Chak Tou Leong
Jian Wang
Wenjie Li
37
1
0
05 Sep 2024
Self-Instructed Derived Prompt Generation Meets In-Context Learning:
  Unlocking New Potential of Black-Box LLMs
Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs
Zhuo Li
Yuhao Du
Jinpeng Hu
Xiang Wan
Anningzhe Gao
34
2
0
03 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
42
0
0
02 Sep 2024
Self-Judge: Selective Instruction Following with Alignment
  Self-Evaluation
Self-Judge: Selective Instruction Following with Alignment Self-Evaluation
Hai Ye
Hwee Tou Ng
ELM
ALM
33
5
0
02 Sep 2024
User-Driven Value Alignment: Understanding Users' Perceptions and
  Strategies for Addressing Biased and Discriminatory Statements in AI
  Companions
User-Driven Value Alignment: Understanding Users' Perceptions and Strategies for Addressing Biased and Discriminatory Statements in AI Companions
Xianzhe Fan
Qing Xiao
Xuhui Zhou
Jiaxin Pei
Maarten Sap
Zhicong Lu
Hong Shen
55
6
0
01 Sep 2024
Iterative Graph Alignment
Iterative Graph Alignment
Fangyuan Yu
H. S. Arora
Matt Johnson
33
2
0
29 Aug 2024
A Survey for Large Language Models in Biomedicine
A Survey for Large Language Models in Biomedicine
Chong Wang
Mengyao Li
Junjun He
Zhongruo Wang
Erfan Darzi
...
Yi Yu
Pietro Liò
Tianyun Wang
Yu Guang Wang
Yiqing Shen
LM&MA
34
9
0
29 Aug 2024
Acceptable Use Policies for Foundation Models
Acceptable Use Policies for Foundation Models
Kevin Klyman
31
14
0
29 Aug 2024
CBF-LLM: Safe Control for LLM Alignment
CBF-LLM: Safe Control for LLM Alignment
Yuya Miyaoka
Masaki Inoue
28
2
0
28 Aug 2024
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large
  Language Models
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models
Dian Yu
Baolin Peng
Ye Tian
Linfeng Song
Haitao Mi
Dong Yu
ALM
LRM
46
1
0
28 Aug 2024
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation
  Strategy of Consistency Model
ConsistencyTrack: A Robust Multi-Object Tracker with a Generation Strategy of Consistency Model
Lifan Jiang
Zhihui Wang
Siqi Yin
Guangxiao Ma
Peng Zhang
Boxi Wu
DiffM
59
0
0
28 Aug 2024
A Statistical Framework for Data-dependent Retrieval-Augmented Models
A Statistical Framework for Data-dependent Retrieval-Augmented Models
Soumya Basu
A. S. Rawat
Manzil Zaheer
RALM
46
0
0
27 Aug 2024
Into the Unknown Unknowns: Engaged Human Learning through Participation
  in Language Model Agent Conversations
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations
Yucheng Jiang
Yijia Shao
Dekun Ma
Sina J. Semnani
Monica S. Lam
LLMAG
37
15
0
27 Aug 2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet
Nathaniel Li
Ziwen Han
Ian Steneker
Willow Primack
Riley Goodside
Hugh Zhang
Zifan Wang
Cristina Menghini
Summer Yue
AAML
MU
46
40
0
27 Aug 2024
Previous
123...567...212223
Next