ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDaMoMe
ArXiv (abs)PDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,202 papers shown
Title
Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning
Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning
Duc Hieu Ho
Chenglin Fan
HILMLRM
16
0
0
19 Jun 2025
Flexible Hardware-Enabled Guarantees for AI Compute
Flexible Hardware-Enabled Guarantees for AI Compute
James Petrie
Onni Aarne
Nora Ammann
David Dalrymple
15
0
0
18 Jun 2025
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning
Language Models can perform Single-Utterance Self-Correction of Perturbed Reasoning
Sam Silver
Jimin Sun
Ivan Zhang
Sara Hooker
Eddie Kim
KELMReLMLRM
25
0
0
18 Jun 2025
Adaptive Accompaniment with ReaLchords
Adaptive Accompaniment with ReaLchords
Yusong Wu
Tim Cooijmans
Kyle Kastner
Adam Roberts
Ian Simon
...
Shayegan Omidshafiei
Aaron Courville
Pablo Samuel Castro
Natasha Jaques
Cheng-Zhi Anna Huang
19
0
0
17 Jun 2025
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models
Chenchen Yuan
Zheyu Zhang
Shuo Yang
Bardh Prenkaj
Gjergji Kasneci
34
0
0
17 Jun 2025
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Ethan M. Rudd
Christopher Andrews
Philip Tully
ELM
34
0
0
16 Jun 2025
SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation
Yashothara Shanmugarasa
Ming Ding
M. Chamikara
Thierry Rakotoarivelo
PILMAILaw
76
0
0
15 Jun 2025
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Bridging the Digital Divide: Small Language Models as a Pathway for Physics and Photonics Education in Underdeveloped Regions
Asghar Ghorbani
Hanieh Fattahi
18
0
0
14 Jun 2025
Language Surgery in Multilingual Large Language Models
Language Surgery in Multilingual Large Language Models
Joanito Agili Lopo
Muhammad Ravi Shulthan Habibi
Tack Hwa Wong
Muhammad Ilham Ghozali
Fajri Koto
Genta Indra Winata
Peerat Limkonchotiwat
Alham Fikri Aji
Samuel Cahyawijaya
25
0
0
14 Jun 2025
InfoFlood: Jailbreaking Large Language Models with Information Overload
InfoFlood: Jailbreaking Large Language Models with Information Overload
Advait Yadav
Haibo Jin
Man Luo
Jun Zhuang
Haohan Wang
AAML
23
0
0
13 Jun 2025
Because we have LLMs, we Can and Should Pursue Agentic Interpretability
Because we have LLMs, we Can and Should Pursue Agentic Interpretability
Been Kim
John Hewitt
Neel Nanda
Noah Fiedel
Oyvind Tafjord
17
0
0
13 Jun 2025
Spurious Rewards: Rethinking Training Signals in RLVR
Spurious Rewards: Rethinking Training Signals in RLVR
Rulin Shao
Shuyue Stella Li
Rui Xin
Scott Geng
Yiping Wang
...
Ranjay Krishna
Yulia Tsvetkov
Hannaneh Hajishirzi
Pang Wei Koh
Luke Zettlemoyer
OffRLReLMLRM
127
11
0
12 Jun 2025
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
Zijie Wu
Chaohui Yu
Fan Wang
Xiang Bai
AI4CE
52
0
0
11 Jun 2025
AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin
Shuo Yang
Qihui Zhang
Yuyang Liu
Yue Huang
Xiaojun Jia
...
Jiayu Yao
Jigang Wang
Hailiang Dai
Yibing Song
Li Yuan
37
0
0
10 Jun 2025
Explicit Preference Optimization: No Need for an Implicit Reward Model
Explicit Preference Optimization: No Need for an Implicit Reward Model
Xiangkun Hu
Lemin Kong
Tong He
David Wipf
15
0
0
09 Jun 2025
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu
L. Jiang
Yancheng Liang
S. Du
Yejin Choi
Tim Althoff
Natasha Jaques
AAMLLRM
19
0
0
09 Jun 2025
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
Wenlong Meng
Shuguo Fan
Chengkun Wei
Min Chen
Yuwei Li
Yuanchao Zhang
Zhikun Zhang
Wenzhi Chen
17
0
0
09 Jun 2025
From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp Metaprogramming Loop
From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp Metaprogramming Loop
Jordi de la Torre
LLMAGKELM
23
0
0
08 Jun 2025
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation
Jaechul Roh
Varun Gandhi
Shivani Anilkumar
Arin Garg
AAMLReLMLRM
37
0
0
08 Jun 2025
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang
Le Wu
Kui Yu
Guangyi Lv
Dacao Zhang
AAMLELM
26
0
0
08 Jun 2025
Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models
Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models
Ren-Jian Wang
Ke Xue
Zeyu Qin
Ziniu Li
Sheng Tang
Hao-Tian Li
Shengcai Liu
Chao Qian
AAML
15
0
0
08 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Haiyun Jiang
OffRLLRM
25
0
0
07 Jun 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRLALM
69
0
0
06 Jun 2025
Truly Self-Improving Agents Require Intrinsic Metacognitive Learning
Tennison Liu
M. Schaar
AIFinLRM
126
0
0
05 Jun 2025
Customizing Speech Recognition Model with Large Language Model Feedback
Customizing Speech Recognition Model with Large Language Model Feedback
Shaoshi Ling
Guoli Ye
19
0
0
05 Jun 2025
Towards provable probabilistic safety for scalable embodied AI systems
Towards provable probabilistic safety for scalable embodied AI systems
Linxuan He
Qing-Shan Jia
Ang Li
Hongyan Sang
Ling Wang
...
Yisen Wang
Peng Wei
Zhongyuan Wang
Henry X. Liu
Shuo Feng
30
0
0
05 Jun 2025
RewardAnything: Generalizable Principle-Following Reward Models
RewardAnything: Generalizable Principle-Following Reward Models
Zhuohao Yu
Jiali Zeng
Weizheng Gu
Yidong Wang
Jindong Wang
Fandong Meng
Jie Zhou
Yue Zhang
Shikun Zhang
Wei Ye
LRM
105
1
0
04 Jun 2025
Exchange of Perspective Prompting Enhances Reasoning in Large Language Models
Exchange of Perspective Prompting Enhances Reasoning in Large Language Models
Lin Sun
Can Zhang
LRM
63
0
0
04 Jun 2025
Misalignment or misuse? The AGI alignment tradeoff
Misalignment or misuse? The AGI alignment tradeoff
Max Hellrigel-Holderbaum
Leonard Dung
73
0
0
04 Jun 2025
Matching Markets Meet LLMs: Algorithmic Reasoning with Ranked Preferences
Hadi Hosseini
Samarth Khanna
Ronak Singh
LRM
47
0
0
04 Jun 2025
Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models
Corrigibility as a Singular Target: A Vision for Inherently Reliable Foundation Models
Ram Potham
Max Harms
LRM
62
0
0
03 Jun 2025
Beyond the Surface: Measuring Self-Preference in LLM Judgments
Beyond the Surface: Measuring Self-Preference in LLM Judgments
Zhi-Yuan Chen
Hao Wang
Xinyu Zhang
Enrui Hu
Yankai Lin
56
0
0
03 Jun 2025
RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models
RACE-Align: Retrieval-Augmented and Chain-of-Thought Enhanced Preference Alignment for Large Language Models
Qihang Yan
Xinyu Zhang
Luming Guo
Qi Zhang
Feifan Liu
AI4TSLRM
40
0
0
03 Jun 2025
AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
AUTOCIRCUIT-RL: Reinforcement Learning-Driven LLM for Automated Circuit Topology Generation
Prashanth Vijayaraghavan
Luyao Shi
Ehsan Degan
Vandana Mukherjee
Xin Zhang
68
0
0
03 Jun 2025
TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference
TO-GATE: Clarifying Questions and Summarizing Responses with Trajectory Optimization for Eliciting Human Preference
Yulin Dou
Jiangming Liu
32
0
0
03 Jun 2025
Incentivizing LLMs to Self-Verify Their Answers
Incentivizing LLMs to Self-Verify Their Answers
Fuxiang Zhang
Jiacheng Xu
Chaojie Wang
Ce Cui
Yang Liu
Bo An
ReLMLRM
54
0
0
02 Jun 2025
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
ReGA: Representation-Guided Abstraction for Model-based Safeguarding of LLMs
Zeming Wei
Chengcan Wu
Meng Sun
57
0
0
02 Jun 2025
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Fodor and Pylyshyn's Legacy - Still No Human-like Systematic Compositionality in Neural Networks
Tim Woydt
Moritz Willig
Antonia Wüst
Lukas Helff
Wolfgang Stammer
Constantin Rothkopf
Kristian Kersting
64
1
0
02 Jun 2025
CoBRA: Quantifying Strategic Language Use and LLM Pragmatics
CoBRA: Quantifying Strategic Language Use and LLM Pragmatics
Anshun Asher Zheng
Junyi Jessy Li
David Beaver
35
0
0
01 Jun 2025
MIRROR: Cognitive Inner Monologue Between Conversational Turns for Persistent Reflection and Reasoning in Conversational LLMs
MIRROR: Cognitive Inner Monologue Between Conversational Turns for Persistent Reflection and Reasoning in Conversational LLMs
Nicole Hsing
LRM
30
0
0
31 May 2025
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning
Yiqing Liang
Jielin Qiu
Wenhao Ding
Zuxin Liu
James Tompkin
Mengdi Xu
Mengzhou Xia
Zhengzhong Tu
Laixi Shi
Jiacheng Zhu
OffRL
125
0
0
30 May 2025
On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
On the Emergence of Weak-to-Strong Generalization: A Bias-Variance Perspective
Gengze Xu
Wei Yao
Ziqiao Wang
Yong Liu
39
0
0
30 May 2025
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Bootstrapping LLM Robustness for VLM Safety via Reducing the Pretraining Modality Gap
Wenhan Yang
Spencer Stice
Ali Payani
Baharan Mirzasoleiman
MLLM
25
0
0
30 May 2025
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling
Intuitionistic Fuzzy Sets for Large Language Model Data Annotation: A Novel Approach to Side-by-Side Preference Labeling
Yimin Du
32
0
0
30 May 2025
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Let Them Down Easy! Contextual Effects of LLM Guardrails on User Perceptions and Preferences
Mingqian Zheng
Wenjia Hu
Patrick Zhao
Motahhare Eslami
Jena D. Hwang
Faeze Brahman
Carolyn Rose
Maarten Sap
15
0
0
30 May 2025
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu
Tong Bu
Zecheng Hao
Jianhao Ding
Zhaofei Yu
30
0
0
30 May 2025
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Distortion of AI Alignment: Does Preference Optimization Optimize for Preferences?
Paul Gölz
Nika Haghtalab
Kunhe Yang
40
0
0
29 May 2025
MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration
MCTSr-Zero: Self-Reflective Psychological Counseling Dialogues Generation via Principles and Adaptive Exploration
Hao Lu
Yanchi Gu
Haoyuan Huang
Yulin Zhou
Ningxin Zhu
Chen Li
51
0
0
29 May 2025
Learning Parametric Distributions from Samples and Preferences
Learning Parametric Distributions from Samples and Preferences
Marc Jourdan
Gizem Yüce
Nicolas Flammarion
25
0
0
29 May 2025
A Survey of Generative Categories and Techniques in Multimodal Large Language Models
A Survey of Generative Categories and Techniques in Multimodal Large Language Models
Longzhen Han
Awes Mubarak
Almas Baimagambetov
Nikolaos Polatidis
Thar Baker
LRM
42
0
0
29 May 2025
1234...232425
Next