ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 347 papers shown
Title
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI
  with a Focus on Model Confidence
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
23
4
0
20 Oct 2024
Speciesism in Natural Language Processing Research
Speciesism in Natural Language Processing Research
Masashi Takeshita
Rafal Rzepka
24
1
0
18 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
33
1
0
17 Oct 2024
BenTo: Benchmark Task Reduction with In-Context Transferability
BenTo: Benchmark Task Reduction with In-Context Transferability
Hongyu Zhao
Ming Li
Lichao Sun
Tianyi Zhou
35
0
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Xia Hu
35
0
0
17 Oct 2024
LLM-Human Pipeline for Cultural Context Grounding of Conversations
LLM-Human Pipeline for Cultural Context Grounding of Conversations
Rajkumar Pujari
Dan Goldwasser
38
1
0
17 Oct 2024
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Adapt-∞\infty∞: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
A. Maharana
Jaehong Yoon
Tianlong Chen
Joey Tianyi Zhou
34
0
0
14 Oct 2024
Evaluating Gender Bias of LLMs in Making Morality Judgements
Evaluating Gender Bias of LLMs in Making Morality Judgements
Divij Bajaj
Yuanyuan Lei
Jonathan Tong
Ruihong Huang
37
3
0
13 Oct 2024
SocialGaze: Improving the Integration of Human Social Norms in Large
  Language Models
SocialGaze: Improving the Integration of Human Social Norms in Large Language Models
Anvesh Rao Vijjini
Rakesh R Menon
Jiayi Fu
Shashank Srivastava
Snigdha Chaturvedi
ALM
34
0
0
11 Oct 2024
Do Unlearning Methods Remove Information from Language Model Weights?
Do Unlearning Methods Remove Information from Language Model Weights?
Aghyad Deeb
Fabien Roger
AAML
MU
47
14
0
11 Oct 2024
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty
  Simulations
TRIAGE: Ethical Benchmarking of AI Models Through Mass Casualty Simulations
Nathalie Maria Kirch
Konstantin Hebenstreit
Matthias Samwald
30
1
0
10 Oct 2024
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study
  of Alignment with Human Responses
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses
Pranav Senthilkumar
Visshwa Balasubramanian
Prisha Jain
Aneesa Maity
Jonathan Lu
Kevin Zhu
14
1
0
10 Oct 2024
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral
  Decision-Making
The Moral Turing Test: Evaluating Human-LLM Alignment in Moral Decision-Making
Basile Garcia
Crystal Qian
Stefano Palminteri
ELM
52
1
0
09 Oct 2024
Scaling Laws for Mixed quantization in Large Language Models
Scaling Laws for Mixed quantization in Large Language Models
Zeyu Cao
Cheng Zhang
Pedro Gimenes
Jianqiao Lu
Jianyi Cheng
Yiren Zhao
MQ
33
1
0
09 Oct 2024
Intuitions of Compromise: Utilitarianism vs. Contractualism
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore
Yejin Choi
Sydney Levine
33
0
0
07 Oct 2024
Unlocking Structured Thinking in Language Models with Cognitive
  Prompting
Unlocking Structured Thinking in Language Models with Cognitive Prompting
Oliver Kramer
Jill Baumann
ReLM
LRM
29
3
0
03 Oct 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu
Liwei Jiang
Yejin Choi
62
3
0
03 Oct 2024
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed
  Bandits
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
23
3
0
02 Oct 2024
Examining the Role of Relationship Alignment in Large Language Models
Examining the Role of Relationship Alignment in Large Language Models
Kristen M. Altenburger
Hongda Jiang
Robert E. Kraut
Yi-Chia Wang
Jane Dwivedi-Yu
29
0
0
02 Oct 2024
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and
  Reliability
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliability
Weitong Zhang
Chengqi Zang
Bernhard Kainz
31
0
0
01 Oct 2024
Predicting and analyzing memorization within fine-tuned Large Language
  Models
Predicting and analyzing memorization within fine-tuned Large Language Models
Jérémie Dentan
Davide Buscaldi
A. Shabou
Sonia Vanier
37
0
0
27 Sep 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
40
5
0
25 Sep 2024
JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language
  Models
JMedBench: A Benchmark for Evaluating Japanese Biomedical Large Language Models
Junfeng Jiang
Jiahao Huang
Akiko Aizawa
LM&MA
35
4
0
20 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4Ed
ELM
49
1
0
19 Sep 2024
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
ValueCompass: A Framework for Measuring Contextual Value Alignment Between Human and LLMs
Hua Shen
Tiffany Knearem
Reshmi Ghosh
Yu-Ju Yang
Tanushree Mitra
Yun Huang
Yun Huang
61
0
0
15 Sep 2024
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through
  Multi-Objective Partitioning
DataSculpt: Crafting Data Landscapes for Long-Context LLMs through Multi-Objective Partitioning
Keer Lu
Xiaonan Nie
Zhuoran Zhang
Zheng Liang
Da Pan
...
Weipeng Chen
Guosheng Dong
Bin Cui
Bin Cui
Wentao Zhang
32
0
0
02 Sep 2024
ToolACE: Winning the Points of LLM Function Calling
ToolACE: Winning the Points of LLM Function Calling
Weiwen Liu
X. Huang
Xingshan Zeng
Xinlong Hao
Shuai Yu
...
Xin Jiang
Ruiming Tang
Defu Lian
Qun Liu
Enhong Chen
LLMAG
40
27
0
02 Sep 2024
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using
  Prefix-Tuning
Novel-WD: Exploring acquisition of Novel World Knowledge in LLMs Using Prefix-Tuning
Maxime Méloux
Christophe Cerisara
KELM
CLL
29
0
0
30 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
85
9
0
27 Aug 2024
Investigating LLM Applications in E-Commerce
Investigating LLM Applications in E-Commerce
Chester Palen-Michel
Ruixiang Wang
Yipeng Zhang
David Yu
Canran Xu
Zhe Wu
16
3
0
23 Aug 2024
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Muhammad Rafsan Kabir
Rafeed Mohammad Sultan
Ihsanul Haque Asif
Jawad Ibn Ahad
Fuad Rahman
Mohammad Ruhul Amin
Nabeel Mohammed
Shafin Rahman
LRM
40
2
0
20 Aug 2024
Promoting Equality in Large Language Models: Identifying and Mitigating
  the Implicit Bias based on Bayesian Theory
Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory
Yongxin Deng
Xihe Qiu
Xiaoyu Tan
Jing Pan
Chen Jue
Zhijun Fang
Yinghui Xu
Wei Chu
Yuan Qi
34
3
0
20 Aug 2024
Value Alignment from Unstructured Text
Value Alignment from Unstructured Text
Inkit Padhi
K. Ramamurthy
P. Sattigeri
Manish Nagireddy
Pierre L. Dognin
Kush R. Varshney
34
0
0
19 Aug 2024
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language
  Models
CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models
Linhao Yu
Yongqi Leng
Yufei Huang
Shang Wu
Haixin Liu
...
Jinwang Song
Tingting Cui
Xiaoqing Cheng
Tao Liu
Deyi Xiong
ELM
16
2
0
19 Aug 2024
How Well Do LLMs Identify Cultural Unity in Diversity?
How Well Do LLMs Identify Cultural Unity in Diversity?
Jialin Li
Junli Wang
Junjie Hu
Ming Jiang
37
4
0
09 Aug 2024
Prompt and Prejudice
Prompt and Prejudice
Lorenzo Berlincioni
Luca Cultrera
Federico Becattini
Marco Bertini
A. Bimbo
43
0
0
07 Aug 2024
Pula: Training Large Language Models for Setswana
Pula: Training Large Language Models for Setswana
Nathan Brown
Vukosi Marivate
OSLM
40
0
0
05 Aug 2024
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
CFBench: A Comprehensive Constraints-Following Benchmark for LLMs
Leo Micklem
Yan-Bin Shen
Wenjing Luo
Yan Zhang
Hao Liang
...
Weipeng Chen
Bin Cui
Blair Thornton
Wentao Zhang
Guosheng Dong
ELM
84
16
0
02 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
26
21
0
31 Jul 2024
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional
  Principles in Complex Scenarios
Legal Minds, Algorithmic Decisions: How LLMs Apply Constitutional Principles in Complex Scenarios
Camilla Bignotti
C. Camassa
AILaw
ELM
48
1
0
29 Jul 2024
Blockchain for Large Language Model Security and Safety: A Holistic
  Survey
Blockchain for Large Language Model Security and Safety: A Holistic Survey
Caleb Geren
Amanda Board
Gaby G. Dagher
Tim Andersen
Jun Zhuang
46
6
0
26 Jul 2024
The Dark Side of Function Calling: Pathways to Jailbreaking Large
  Language Models
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
Zihui Wu
Haichang Gao
Jianping He
Ping Wang
32
6
0
25 Jul 2024
Course-Correction: Safety Alignment Using Synthetic Preferences
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu
Yishuo Cai
Zhenhong Zhou
Renjie Gu
Haiqin Weng
Yan Liu
Lei Bai
Wei Xu
Han Qiu
31
4
0
23 Jul 2024
Virtue Ethics For Ethically Tunable Robotic Assistants
Virtue Ethics For Ethically Tunable Robotic Assistants
Rajitha Ramanayake
Vivek Nallur
21
0
0
23 Jul 2024
ALLaM: Large Language Models for Arabic and English
ALLaM: Large Language Models for Arabic and English
M Saiful Bari
Yazeed Alnumay
Norah A. Alzahrani
Nouf M. Alotaibi
H. A. Alyahya
...
Jeril Kuriakose
Abdalghani Abujabal
Nora Al-Twairesh
Areeb Alowisheq
Haidar Khan
42
11
0
22 Jul 2024
Internal Consistency and Self-Feedback in Large Language Models: A
  Survey
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Xun Liang
Shichao Song
Zifan Zheng
Hanyu Wang
Qingchen Yu
...
Rong-Hua Li
Peng Cheng
Zhonghao Wang
Feiyu Xiong
Zhiyu Li
HILM
LRM
68
25
0
19 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
53
7
0
16 Jul 2024
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting
  Technique
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique
M. Russinovich
Ahmed Salem
51
12
0
15 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated
  Responses
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELM
ALM
38
7
0
15 Jul 2024
The Sociolinguistic Foundations of Language Modeling
The Sociolinguistic Foundations of Language Modeling
Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
A. Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
41
7
0
12 Jul 2024
Previous
1234567
Next