ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDaMoMe
ArXiv (abs)PDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,202 papers shown
Title
Optimized Couplings for Watermarking Large Language Models
Optimized Couplings for Watermarking Large Language Models
Dor Tsur
Carol Xuan Long
C. M. Verdun
Hsiang Hsu
Haim Permuter
Flavio du Pin Calmon
WaLM
90
1
0
13 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
Choong Seon Hong
AI4CE
161
1
0
11 May 2025
Multi-Agent Systems for Robotic Autonomy with LLMs
Multi-Agent Systems for Robotic Autonomy with LLMs
Junhong Chen
Ziqi Yang
Haoyuan G Xu
Dandan Zhang
George Mylonas
LLMAG
110
1
0
09 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
156
0
0
08 May 2025
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Optimization Problem Solving Can Transition to Evolutionary Agentic Workflows
Wenhao Li
Bo Jin
Mingyi Hong
Changhong Lu
Xiangfeng Wang
156
0
0
07 May 2025
RM-R1: Reward Modeling as Reasoning
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Yu Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLMOffRLLRM
391
21
0
05 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
225
5
0
05 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
76
2
0
05 May 2025
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
SIMPLEMIX: Frustratingly Simple Mixing of Off- and On-policy Data in Language Model Preference Learning
Tianjian Li
Daniel Khashabi
135
0
0
05 May 2025
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs
Haoming Yang
Ke Ma
Xiaojun Jia
Yingfei Sun
Qianqian Xu
Qingming Huang
AAML
435
0
0
03 May 2025
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation
Vaidehi Patil
Yi-Lin Sung
Peter Hase
Jie Peng
Jen-tse Huang
Joey Tianyi Zhou
AAMLMU
283
4
0
01 May 2025
Real-World Gaps in AI Governance Research
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
158
1
0
30 Apr 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
144
0
0
30 Apr 2025
PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations
PRISM: Projection-based Reward Integration for Scene-Aware Real-to-Sim-to-Real Transfer with Few Demonstrations
Haowen Sun
Haoran Wang
Chengzhong Ma
Shaolong Zhang
Jiawei Ye
Xingyu Chen
Xuguang Lan
OffRL
121
1
0
29 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
167
0
0
27 Apr 2025
Super Co-alignment of Human and AI for Sustainable Symbiotic Society
Super Co-alignment of Human and AI for Sustainable Symbiotic Society
Yi Zeng
Yijiao Wang
Enmeng Lu
Dongcheng Zhao
Bing Han
...
Chao Liu
Yaodong Yang
Yi Zeng
Boyuan Chen
Jinyu Fan
185
0
0
24 Apr 2025
Cognitive Silicon: An Architectural Blueprint for Post-Industrial Computing Systems
Christoforus Yoga Haryanto
Emily Lomempow
72
0
0
23 Apr 2025
Safety Pretraining: Toward the Next Generation of Safe AI
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
218
5
0
23 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David Evans
LLMSV
159
3
0
23 Apr 2025
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Daniel Hendriks
Philipp Spitzer
Niklas Kühl
G. Satzger
129
2
0
22 Apr 2025
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks
Minghao Wu
Weixuan Wang
Sinuo Liu
Huifeng Yin
Xintong Wang
Yu Zhao
Chenyang Lyu
Longyue Wang
Weihua Luo
Kaifu Zhang
ELM
152
5
0
22 Apr 2025
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose
Zhihan Xiong
Yuejie Chi
Simon S. Du
Lin Xiao
Maryam Fazel
84
2
0
20 Apr 2025
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
Liang Peng
Boxi Wu
Haoran Cheng
Yibo Zhao
Xiaofei He
61
0
0
20 Apr 2025
Harnessing Generative LLMs for Enhanced Financial Event Entity Extraction Performance
Harnessing Generative LLMs for Enhanced Financial Event Entity Extraction Performance
Soo-joon Choi
Ji-jun Park
85
0
0
20 Apr 2025
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Shaomu Tan
Christof Monz
108
0
0
18 Apr 2025
Image-Editing Specialists: An RLAIF Approach for Diffusion Models
Image-Editing Specialists: An RLAIF Approach for Diffusion Models
Elior Benarous
Yilun Du
Heng Yang
60
0
0
17 Apr 2025
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
Syntactic and Semantic Control of Large Language Models via Sequential Monte Carlo
João Loula
Benjamin LeBrun
Li Du
Ben Lipkin
Clemente Pasti
...
Ryan Cotterel
Vikash K. Mansinghka
Alexander K. Lew
Tim Vieira
Timothy J. O'Donnell
161
8
0
17 Apr 2025
Aligning Constraint Generation with Design Intent in Parametric CAD
Aligning Constraint Generation with Design Intent in Parametric CAD
Evan Casey
Tianyu Zhang
Shu Ishida
John Roger Thompson
Amir Hosein Khasahmadi
Joseph George Lambourne
P. Jayaraman
K. Willis
92
0
0
17 Apr 2025
Integrating Structural and Semantic Signals in Text-Attributed Graphs with BiGTex
Integrating Structural and Semantic Signals in Text-Attributed Graphs with BiGTex
Azadeh Beiranvand
Seyed Mehdi Vahidipour
81
0
0
16 Apr 2025
REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
Zhihao Xu
Yongqi Tong
Xin Zhang
Jun Zhou
Xiting Wang
74
0
0
15 Apr 2025
Teaching Large Language Models to Reason through Learning and Forgetting
Teaching Large Language Models to Reason through Learning and Forgetting
Tianwei Ni
Allen Nie
Sapana Chaudhary
Yao Liu
Huzefa Rangwala
Rasool Fakoor
ReLMCLLLRM
476
0
0
15 Apr 2025
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
AdaSteer: Your Aligned LLM is Inherently an Adaptive Jailbreak Defender
Weixiang Zhao
Jiahe Guo
Yulin Hu
Yang Deng
An Zhang
...
Xinyang Han
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
AAMLLLMSV
103
4
0
13 Apr 2025
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
QM-ToT: A Medical Tree of Thoughts Reasoning Framework for Quantized Model
Zongxian Yang
Jiayu Qian
Z. Huang
Kay Chen Tan
LM&MALRM
162
0
0
13 Apr 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSVLRM
61
2
0
13 Apr 2025
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang
Taolin Zhang
Richang Hong
Jun Huang
ReLMLRM
105
2
0
12 Apr 2025
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
PathVLM-R1: A Reinforcement Learning-Driven Reasoning Model for Pathology Visual-Language Tasks
Jian Wu
Hao Yang
Xinhua Zeng
Guibing He
Zhe Chen
Zhu Li
Xinming Zhang
Yangyang Ma
Run Fang
Yang Liu
LRM
385
1
0
12 Apr 2025
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
105
4
0
10 Apr 2025
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
HalluciNot: Hallucination Detection Through Context and Common Knowledge Verification
Bibek Paudel
Alexander Lyzhov
Preetam Joshi
Puneet Anand
HILM
88
2
0
09 Apr 2025
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
CAReDiO: Cultural Alignment of LLM via Representativeness and Distinctiveness Guided Data Optimization
Jing Yao
Xiaoyuan Yi
Jindong Wang
Zhicheng Dou
Xing Xie
64
2
0
09 Apr 2025
Bypassing Safety Guardrails in LLMs Using Humor
Bypassing Safety Guardrails in LLMs Using Humor
Pedro Cisneros-Velarde
128
1
0
09 Apr 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
81
0
0
08 Apr 2025
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Lightweight and Direct Document Relevance Optimization for Generative Information Retrieval
Kidist Amde Mekonnen
Yubao Tang
Maarten de Rijke
119
0
0
07 Apr 2025
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Truthful or Fabricated? Using Causal Attribution to Mitigate Reward Hacking in Explanations
Pedro Ferreira
Wilker Aziz
Ivan Titov
LRM
94
0
0
07 Apr 2025
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Revealing the Intrinsic Ethical Vulnerability of Aligned Large Language Models
Jiawei Lian
Jianhong Pan
L. Wang
Yi Wang
Shaohui Mei
Lap-Pui Chau
AAML
137
0
0
07 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELMLRM
89
3
0
04 Apr 2025
Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction
Reciprocity-Aware Convolutional Neural Networks for Map-Based Path Loss Prediction
Ryan Dempsey
Jonathan Ethier
Halim Yanikomeroglu
62
2
0
04 Apr 2025
On the Connection Between Diffusion Models and Molecular Dynamics
On the Connection Between Diffusion Models and Molecular Dynamics
Liam Harcombe
Timothy T. Duignan
DiffM
107
1
0
04 Apr 2025
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
What Large Language Models Do Not Talk About: An Empirical Study of Moderation and Censorship Practices
Sander Noels
Guillaume Bied
Maarten Buyl
Alexander Rogiers
Yousra Fettach
Jefrey Lijffijt
Tijl De Bie
103
1
0
04 Apr 2025
LLM Social Simulations Are a Promising Research Method
LLM Social Simulations Are a Promising Research Method
Jacy Reese Anthis
Ryan Liu
Sean M. Richardson
Austin C. Kozlowski
Bernard Koch
James A. Evans
Erik Brynjolfsson
Michael S. Bernstein
ALM
111
15
0
03 Apr 2025
Prompt Optimization with Logged Bandit Data
Prompt Optimization with Logged Bandit Data
Haruka Kiyohara
Daniel Yiming Cao
Yuta Saito
Thorsten Joachims
234
0
0
03 Apr 2025
Previous
123456...232425
Next