ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08073
  4. Cited By
Constitutional AI: Harmlessness from AI Feedback

Constitutional AI: Harmlessness from AI Feedback

15 December 2022
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
Andy Jones
A. Chen
Anna Goldie
Azalia Mirhoseini
C. McKinnon
Carol Chen
Catherine Olsson
C. Olah
Danny Hernandez
Dawn Drain
Deep Ganguli
Dustin Li
Eli Tran-Johnson
E. Perez
Jamie Kerr
J. Mueller
Jeff Ladish
J. Landau
Kamal Ndousse
Kamilė Lukošiūtė
Liane Lovitt
Michael Sellitto
Nelson Elhage
Nicholas Schiefer
Noemí Mercado
Nova Dassarma
R. Lasenby
Robin Larson
Sam Ringer
Scott R. Johnston
Shauna Kravec
S. E. Showk
Stanislav Fort
Tamera Lanham
Timothy Telleen-Lawton
Tom Conerly
T. Henighan
Tristan Hume
Sam Bowman
Zac Hatfield-Dodds
Benjamin Mann
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
    SyDa
    MoMe
ArXivPDFHTML

Papers citing "Constitutional AI: Harmlessness from AI Feedback"

50 / 1,106 papers shown
Title
Scopes of Alignment
Scopes of Alignment
Kush R. Varshney
Zahra Ashktorab
Djallel Bouneffouf
Matthew D Riemer
Justin D. Weisz
36
0
0
15 Jan 2025
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Text-Diffusion Red-Teaming of Large Language Models: Unveiling Harmful Behaviors with Proximity Constraints
Jonathan Nöther
Adish Singla
Goran Radanović
AAML
57
0
0
14 Jan 2025
Predictable Artificial Intelligence
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
51
2
0
08 Jan 2025
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
66
0
0
08 Jan 2025
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model
Yueqin Yin
Shentao Yang
Yujia Xie
Ziyi Yang
Yuting Sun
Hany Awadalla
Weizhu Chen
Mingyuan Zhou
50
0
0
07 Jan 2025
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Explicit vs. Implicit: Investigating Social Bias in Large Language Models through Self-Reflection
Yachao Zhao
Bo Wang
Yan Wang
50
2
0
04 Jan 2025
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
OSLM
LRM
110
408
0
03 Jan 2025
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
From Generalist to Specialist: A Survey of Large Language Models for Chemistry
Yang Han
Ziping Wan
Lu Chen
Kai Yu
Xin Chen
LM&MA
35
1
0
31 Dec 2024
MLLM-as-a-Judge for Image Safety without Human Labeling
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang
Shuming Hu
Shiyu Zhao
Xiaowen Lin
F. Xu
...
Nan Jiang
Lingjuan Lyu
Shiqing Ma
Dimitris N. Metaxas
Ankit Jain
164
1
0
31 Dec 2024
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
X. Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
61
18
0
31 Dec 2024
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
65
7
0
31 Dec 2024
Geometric-Averaged Preference Optimization for Soft Preference Labels
Geometric-Averaged Preference Optimization for Soft Preference Labels
Hiroki Furuta
Kuang-Huei Lee
Shixiang Shane Gu
Y. Matsuo
Aleksandra Faust
Heiga Zen
Izzeddin Gur
58
7
0
31 Dec 2024
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
59
0
0
25 Dec 2024
Multimodal Preference Data Synthetic Alignment with Reward Model
Multimodal Preference Data Synthetic Alignment with Reward Model
Robert Wijaya
Ngoc-Bao Nguyen
Ngai-man Cheung
MLLM
SyDa
62
2
0
23 Dec 2024
Cannot or Should Not? Automatic Analysis of Refusal Composition in
  IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Cannot or Should Not? Automatic Analysis of Refusal Composition in IFT/RLHF Datasets and Refusal Behavior of Black-Box LLMs
Alexander von Recum
Christoph Schnabl
Gabor Hollbeck
Silas Alberti
Philip Blinde
Marvin von Hagen
92
2
0
22 Dec 2024
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Towards Safe and Honest AI Agents with Neural Self-Other Overlap
Marc Carauleanu
Michael Vaiana
Judd Rosenblatt
Cameron Berg
Diogo Schwerz de Lucena
68
0
0
20 Dec 2024
Gradual Vigilance and Interval Communication: Enhancing Value Alignment
  in Multi-Agent Debates
Gradual Vigilance and Interval Communication: Enhancing Value Alignment in Multi-Agent Debates
Rui Zou
Mengqi Wei
Jintian Feng
Qian Wan
Jianwen Sun
Sannyuya Liu
77
0
0
18 Dec 2024
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on
  Large Language Models
UAlign: Leveraging Uncertainty Estimations for Factuality Alignment on Large Language Models
Boyang Xue
Fei Mi
Qi Zhu
Hongru Wang
Rui Wang
Sheng Wang
Erxin Yu
Xuming Hu
Kam-Fai Wong
HILM
74
0
0
16 Dec 2024
A Method for Detecting Legal Article Competition for Korean Criminal Law
  Using a Case-augmented Mention Graph
A Method for Detecting Legal Article Competition for Korean Criminal Law Using a Case-augmented Mention Graph
Seonho An
Young Yik Rhim
Min-Soo Kim
AILaw
84
0
0
16 Dec 2024
The Superalignment of Superhuman Intelligence with Large Language Models
The Superalignment of Superhuman Intelligence with Large Language Models
Minlie Huang
Yingkang Wang
Shiyao Cui
Pei Ke
J. Tang
113
1
0
15 Dec 2024
TapeAgents: a Holistic Framework for Agent Development and Optimization
TapeAgents: a Holistic Framework for Agent Development and Optimization
Dzmitry Bahdanau
Nicolas Angelard-Gontier
Gabriel Huang
Ehsan Kamalloo
Rafael Pardinas
...
Jordan Prince Tremblay
Karam Ghanem
S. Parikh
Mitul Tiwari
Quaizar Vohra
70
2
0
11 Dec 2024
Coverage-based Fairness in Multi-document Summarization
Coverage-based Fairness in Multi-document Summarization
Haoyuan Li
Yusen Zhang
Rui Zhang
Snigdha Chaturvedi
80
0
0
11 Dec 2024
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning
Mathurin Videau
Alessandro Leite
Marc Schoenauer
O. Teytaud
ReLM
LRM
79
0
0
05 Dec 2024
Reinforcement Learning Enhanced LLMs: A Survey
Reinforcement Learning Enhanced LLMs: A Survey
Shuhe Wang
Shengyu Zhang
Jingyang Zhang
Runyi Hu
Xiaoya Li
Tianwei Zhang
Jiwei Li
Fei Wu
G. Wang
Eduard H. Hovy
OffRL
134
7
0
05 Dec 2024
Query Performance Explanation through Large Language Model for HTAP
  Systems
Query Performance Explanation through Large Language Model for HTAP Systems
Haibo Xiu
Li Zhang
Tieying Zhang
Jun Yang
Jianjun Chen
74
0
0
02 Dec 2024
PEFT-as-an-Attack! Jailbreaking Language Models during Federated
  Parameter-Efficient Fine-Tuning
PEFT-as-an-Attack! Jailbreaking Language Models during Federated Parameter-Efficient Fine-Tuning
Shenghui Li
Edith C. H. Ngai
Fanghua Ye
Thiemo Voigt
SILM
90
6
0
28 Nov 2024
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
120
67
0
25 Nov 2024
From Jack of All Trades to Master of One: Specializing LLM-based
  Autoraters to a Test Set
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
M. Finkelstein
Dan Deutsch
Parker Riley
Juraj Juraska
Geza Kovacs
Markus Freitag
76
0
0
23 Nov 2024
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs
  on Low-Resource Languages
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages
Bethel Melesse Tessema
Akhil Kedia
Tae-Sun Chung
72
0
0
21 Nov 2024
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
Xinyan Guan
Yanjiang Liu
Xinyu Lu
Boxi Cao
Xianpei Han
...
Le Sun
Jie Lou
Bowen Yu
Yaojie Lu
Hongyu Lin
ALM
83
2
0
18 Nov 2024
The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
Xikang Yang
Xuehai Tang
Jizhong Han
Songlin Hu
68
0
0
18 Nov 2024
CROW: Eliminating Backdoors from Large Language Models via Internal
  Consistency Regularization
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min
Long H. Pham
Yige Li
Tianlong Chen
AAML
64
4
0
18 Nov 2024
Steering Language Model Refusal with Sparse Autoencoders
Kyle O'Brien
David Majercak
Xavier Fernandes
Richard Edgar
Jingya Chen
Harsha Nori
Dean Carignan
Eric Horvitz
Forough Poursabzi-Sangde
LLMSV
67
10
0
18 Nov 2024
SymDPO: Boosting In-Context Learning of Large Multimodal Models with
  Symbol Demonstration Direct Preference Optimization
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia
Chaoya Jiang
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
MLLM
89
2
0
17 Nov 2024
SPICA: Retrieving Scenarios for Pluralistic In-Context Alignment
Quan Ze Chen
K. J. Kevin Feng
Chan Young Park
Amy X. Zhang
26
0
0
16 Nov 2024
Dynamic Rewarding with Prompt Optimization Enables Tuning-free
  Self-Alignment of Language Models
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models
Somanshu Singla
Zhen Wang
Tianyang Liu
Abdullah Ashfaq
Zhiting Hu
Eric P. Xing
28
0
0
13 Nov 2024
PyGen: A Collaborative Human-AI Approach to Python Package Creation
PyGen: A Collaborative Human-AI Approach to Python Package Creation
Saikat Barua
Mostafizur Rahman
Md Jafor Sadek
Rafiul Islam
Shehnaz Khaled
Md. Shohrab Hossain
44
1
0
13 Nov 2024
Mitigating Metric Bias in Minimum Bayes Risk Decoding
Mitigating Metric Bias in Minimum Bayes Risk Decoding
Geza Kovacs
Daniel Deutsch
Markus Freitag
34
6
0
05 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning
  from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
29
8
0
04 Nov 2024
Rule Based Rewards for Language Model Safety
Rule Based Rewards for Language Model Safety
Tong Mu
Alec Helyar
Johannes Heidecke
Joshua Achiam
Andrea Vallone
Ian Kivlichan
Molly Lin
Alex Beutel
John Schulman
Lilian Weng
ALM
44
35
0
02 Nov 2024
Token-level Proximal Policy Optimization for Query Generation
Token-level Proximal Policy Optimization for Query Generation
Yichen Ouyang
Lu Wang
Fangkai Yang
Pu Zhao
Chenghua Huang
...
Saravan Rajmohan
Weiwei Deng
Dongmei Zhang
Feng Sun
Qi Zhang
OffRL
134
3
0
01 Nov 2024
Active Preference-based Learning for Multi-dimensional Personalization
Active Preference-based Learning for Multi-dimensional Personalization
Minhyeon Oh
Seungjoon Lee
Jungseul Ok
28
1
0
01 Nov 2024
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation
Bohan Lyu
Yadi Cao
Duncan Watson-Parris
Leon Bergen
Taylor Berg-Kirkpatrick
Rose Yu
61
3
0
01 Nov 2024
Representative Social Choice: From Learning Theory to AI Alignment
Representative Social Choice: From Learning Theory to AI Alignment
Tianyi Qiu
FedML
38
2
0
31 Oct 2024
Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in
  Small Models Fine-tuned on Data from Larger Models
Exploring the Knowledge Mismatch Hypothesis: Hallucination Propensity in Small Models Fine-tuned on Data from Larger Models
Phil Wee
Riyadh Baghdadi
HILM
32
0
0
31 Oct 2024
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large
  Language Models
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
Junda Wu
Xintong Li
Ruoyu Wang
Yu Xia
Yuxin Xiong
...
Xiang Chen
B. Kveton
Lina Yao
Jingbo Shang
Julian McAuley
OffRL
LRM
29
0
0
31 Oct 2024
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
Gabrielle Kaili-May Liu
Bowen Shi
Avi Caciularu
Idan Szpektor
Arman Cohan
72
4
0
30 Oct 2024
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and
  Prompt Types
SG-Bench: Evaluating LLM Safety Generalization Across Diverse Tasks and Prompt Types
Yutao Mou
Shikun Zhang
Wei Ye
ELM
40
8
0
29 Oct 2024
Fast Best-of-N Decoding via Speculative Rejection
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun
Momin Haider
Ruiqi Zhang
Huitao Yang
Jiahao Qiu
Ming Yin
Mengdi Wang
Peter L. Bartlett
Andrea Zanette
BDL
45
28
0
26 Oct 2024
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies
L. Wang
Sheng Chen
Linnan Jiang
Shu Pan
Runze Cai
Sen Yang
Fei Yang
49
3
0
24 Oct 2024
Previous
12345...212223
Next