ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2008.02275
  4. Cited By
Aligning AI With Shared Human Values

Aligning AI With Shared Human Values

5 August 2020
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
ArXivPDFHTML

Papers citing "Aligning AI With Shared Human Values"

50 / 347 papers shown
Title
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation
  Capabilities Beyond 100 Languages
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
42
24
0
08 Jul 2024
Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of
  "Moral Stories"
Some Issues in Predictive Ethics Modeling: An Annotated Contrast Set of "Moral Stories"
Ben Fitzgerald
21
0
0
07 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
39
12
0
06 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Chenyu You
Jimmy Huang
ELM
ALM
31
28
0
04 Jul 2024
Multilingual Trolley Problems for Language Models
Multilingual Trolley Problems for Language Models
Zhijing Jin
Sydney Levine
Max Kleiman-Weiner
Giorgio Piatti
Jiarui Liu
...
András Strausz
Mrinmaya Sachan
Rada Mihalcea
Yejin Choi
Bernhard Schölkopf
LRM
50
5
0
02 Jul 2024
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
Is Your Large Language Model Knowledgeable or a Choices-Only Cheater?
Nishant Balepur
Rachel Rudinger
50
6
0
02 Jul 2024
ProgressGym: Alignment with a Millennium of Moral Progress
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Yalan Qin
Yaodong Yang
AI4TS
38
4
0
28 Jun 2024
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment
Yue Guo
Yi Yang
31
8
0
27 Jun 2024
The Multilingual Alignment Prism: Aligning Global and Local Preferences
  to Reduce Harm
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
Aakanksha
Arash Ahmadian
Beyza Ermis
Seraphina Goldfarb-Tarrant
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
40
28
0
26 Jun 2024
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning
  Graph
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph
Zhehao Zhang
Jiaao Chen
Diyi Yang
LRM
37
8
0
25 Jun 2024
Does Cross-Cultural Alignment Change the Commonsense Morality of
  Language Models?
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models?
Yuu Jinnai
49
1
0
24 Jun 2024
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch
Hasan Hammoud
Umberto Michieli
Fabio Pizzati
Philip Torr
Adel Bibi
Guohao Li
Mete Ozay
MoMe
31
15
0
20 Jun 2024
LiveMind: Low-latency Large Language Models with Simultaneous Inference
LiveMind: Low-latency Large Language Models with Simultaneous Inference
Chuangtao Chen
Grace Li Zhang
Xunzhao Yin
Cheng Zhuo
Ulf Schlichtmann
Bing Li
LRM
45
3
0
20 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Shu Wang
Xing Xie
Xing Xie
ALM
ELM
52
5
0
20 Jun 2024
Cultural Conditioning or Placebo? On the Effectiveness of
  Socio-Demographic Prompting
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting
Sagnik Mukherjee
Muhammad Farid Adilazuarda
Sunayana Sitaram
Kalika Bali
Alham Fikri Aji
Monojit Choudhury
43
5
0
17 Jun 2024
The Potential and Challenges of Evaluating Attitudes, Opinions, and
  Values in Large Language Models
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Bolei Ma
Xinpeng Wang
Tiancheng Hu
Anna Haensch
Michael A. Hedderich
Barbara Plank
Frauke Kreuter
ALM
37
2
0
16 Jun 2024
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness
  Evaluation in Large Language Models
RUPBench: Benchmarking Reasoning Under Perturbations for Robustness Evaluation in Large Language Models
Yuqing Wang
Yun Zhao
LRM
AAML
ELM
27
1
0
16 Jun 2024
Toward Optimal LLM Alignments Using Two-Player Games
Toward Optimal LLM Alignments Using Two-Player Games
Rui Zheng
Hongyi Guo
Zhihan Liu
Xiaoying Zhang
Yuanshun Yao
...
Tao Gui
Qi Zhang
Xuanjing Huang
Hang Li
Yang Liu
62
5
0
16 Jun 2024
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent
  Cybersecurity
Ollabench: Evaluating LLMs' Reasoning for Human-centric Interdependent Cybersecurity
Tam n. Nguyen
ELM
47
2
0
11 Jun 2024
Language Models are Alignable Decision-Makers: Dataset and Application
  to the Medical Triage Domain
Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain
Brian Hu
Bill Ray
Alice Leung
Amy Summerville
David Joy
Christopher Funk
Arslan Basharat
25
2
0
10 Jun 2024
MoralBench: Moral Evaluation of LLMs
MoralBench: Moral Evaluation of LLMs
Jianchao Ji
Yutong Chen
Mingyu Jin
Wujiang Xu
Wenyue Hua
Yongfeng Zhang
ELM
49
6
0
06 Jun 2024
Scaling and evaluating sparse autoencoders
Scaling and evaluating sparse autoencoders
Leo Gao
Tom Dupré la Tour
Henk Tillman
Gabriel Goh
Rajan Troll
Alec Radford
Ilya Sutskever
Jan Leike
Jeffrey Wu
38
118
0
06 Jun 2024
Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs
  Understand Textile Hand?
Exploring Human-AI Perception Alignment in Sensory Experiences: Do LLMs Understand Textile Hand?
Shu Zhong
Elia Gatti
Youngjun Cho
Marianna Obrist
57
3
0
05 Jun 2024
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language
  Models
FinerCut: Finer-grained Interpretable Layer Pruning for Large Language Models
Yang Zhang
Yawei Li
Xinpeng Wang
Qianli Shen
Barbara Plank
Bernd Bischl
Mina Rezaei
Kenji Kawaguchi
60
7
0
28 May 2024
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for
  Controllable Language Generation
BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation
Chengxing Jia
Pengyuan Wang
Ziniu Li
Yi-Chen Li
Zhilong Zhang
Nan Tang
Yang Yu
OffRL
39
1
0
27 May 2024
On Bits and Bandits: Quantifying the Regret-Information Trade-off
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Itai Shufaro
Nadav Merlis
Nir Weinberger
Shie Mannor
38
0
0
26 May 2024
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large
  Language Models
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
Xudong Lu
Aojun Zhou
Yuhui Xu
Renrui Zhang
Peng Gao
Hongsheng Li
37
7
0
25 May 2024
Instruction Tuning With Loss Over Instructions
Instruction Tuning With Loss Over Instructions
Zhengyan Shi
Adam X. Yang
Bin Wu
Laurence Aitchison
Emine Yilmaz
Aldo Lipani
ALM
24
20
0
23 May 2024
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based
  Evaluation
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
Jingnan Zheng
Han Wang
An Zhang
Tai D. Nguyen
Jun Sun
Tat-Seng Chua
LLMAG
40
14
0
23 May 2024
CIVICS: Building a Dataset for Examining Culturally-Informed Values in
  Large Language Models
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models
Giada Pistilli
Alina Leidinger
Yacine Jernite
Atoosa Kasirzadeh
A. Luccioni
Margaret Mitchell
26
2
0
22 May 2024
Metabook: An Automatically Generated Augmented Reality Storybook
  Interaction System to Improve Children's Engagement in Storytelling
Metabook: An Automatically Generated Augmented Reality Storybook Interaction System to Improve Children's Engagement in Storytelling
Yibo Wang
Yuanyuan Mao
Shi-ting Ni
43
0
0
22 May 2024
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research
Jiajie Jin
Yutao Zhu
Xinyu Yang
Chenghao Zhang
Zhicheng Dou
Chenghao Zhang
Tong Zhao
Zhao Yang
Zhicheng Dou
Ji-Rong Wen
VLM
85
49
0
22 May 2024
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in
  LLMs
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Bilgehan Sel
Priya Shanmugasundaram
Mohammad Kachuee
Kun Zhou
Ruoxi Jia
Ming Jin
LRM
40
2
0
21 May 2024
LLM-based Multi-Agent Reinforcement Learning: Current and Future
  Directions
LLM-based Multi-Agent Reinforcement Learning: Current and Future Directions
Chuanneng Sun
Songjun Huang
D. Pompili
LLMAG
45
29
0
17 May 2024
Facilitating Opinion Diversity through Hybrid NLP Approaches
Facilitating Opinion Diversity through Hybrid NLP Approaches
Michiel van der Meer
47
0
0
15 May 2024
New Textual Corpora for Serbian Language Modeling
New Textual Corpora for Serbian Language Modeling
Mihailo Škorić
Nikola Janković
32
0
0
15 May 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
34
3
0
14 May 2024
LMD3: Language Model Data Density Dependence
LMD3: Language Model Data Density Dependence
John Kirchenbauer
Garrett Honke
Gowthami Somepalli
Jonas Geiping
Daphne Ippolito
Katherine Lee
Tom Goldstein
David Andre
35
6
0
10 May 2024
Assessing and Verifying Task Utility in LLM-Powered Applications
Assessing and Verifying Task Utility in LLM-Powered Applications
Negar Arabzadeh
Siging Huo
Nikhil Mehta
Qinqyun Wu
Chi Wang
Ahmed Hassan Awadallah
Charles L. A. Clarke
Julia Kiseleva
38
10
0
03 May 2024
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Aloe: A Family of Fine-tuned Open Healthcare LLMs
Ashwin Kumar Gururajan
Enrique Lopez-Cuena
Jordi Bayarri-Planas
Adrián Tormos
Daniel Hinjos
...
Lucia Urcelay-Ganzabal
Marta Gonzalez-Mallo
Sergio Alvarez-Napagao
Eduard Ayguadé-Parra
Ulises Cortés Dario Garcia-Gasulla
ELM
LM&MA
35
14
0
03 May 2024
More RLHF, More Trust? On The Impact of Human Preference Alignment On
  Language Model Trustworthiness
More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness
Aaron Jiaxun Li
Satyapriya Krishna
Himabindu Lakkaraju
45
3
0
29 Apr 2024
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the
  Language we Prompt them in
Ethical Reasoning and Moral Value Alignment of LLMs Depend on the Language we Prompt them in
Utkarsh Agarwal
Kumar Tanmay
Aditi Khandelwal
Monojit Choudhury
LRM
31
7
0
29 Apr 2024
Continual Learning of Large Language Models: A Comprehensive Survey
Continual Learning of Large Language Models: A Comprehensive Survey
Haizhou Shi
Zihao Xu
Hengyi Wang
Weiyi Qin
Wenyuan Wang
Yibin Wang
Zifeng Wang
Sayna Ebrahimi
Hao Wang
CLL
KELM
LRM
52
64
0
25 Apr 2024
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society
  of LLM Agents
Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents
Giorgio Piatti
Zhijing Jin
Max Kleiman-Weiner
Bernhard Schölkopf
Mrinmaya Sachan
Rada Mihalcea
LLMAG
60
15
0
25 Apr 2024
Beyond Human Norms: Unveiling Unique Values of Large Language Models
  through Interdisciplinary Approaches
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
Pablo Biedma
Xiaoyuan Yi
Linus Huang
Maosong Sun
Xing Xie
PILM
40
3
0
19 Apr 2024
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
AdvisorQA: Towards Helpful and Harmless Advice-seeking Question Answering with Collective Intelligence
Minbeom Kim
Hwanhee Lee
Joonsuk Park
Hwaran Lee
Kyomin Jung
40
1
0
18 Apr 2024
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans
  and Language Models
Procedural Dilemma Generation for Evaluating Moral Reasoning in Humans and Language Models
Jan-Philipp Fränken
Kanishk Gandhi
Tori Qiu
Ayesha Khawaja
Noah D. Goodman
Tobias Gerstenberg
ELM
40
1
0
17 Apr 2024
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
HLAT: High-quality Large Language Model Pre-trained on AWS Trainium
Haozheng Fan
Hao Zhou
Guangtai Huang
Parameswaran Raman
Xinwei Fu
Gaurav Gupta
Dhananjay Ram
Yida Wang
Jun Huan
48
5
0
16 Apr 2024
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput
  in Large Language Models
Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models
Siyan Zhao
Daniel Israel
Mathias Niepert
Aditya Grover
KELM
VLM
36
5
0
15 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
53
6
0
12 Apr 2024
Previous
1234567
Next