ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.12344
  4. Cited By
The Better Angels of Machine Personality: How Personality Relates to LLM
  Safety

The Better Angels of Machine Personality: How Personality Relates to LLM Safety

17 July 2024
Jie M. Zhang
Dongrui Liu
Chao Qian
Ziyue Gan
Yong-jin Liu
Yu Qiao
Jing Shao
    LLMAG
    PILM
ArXivPDFHTML

Papers citing "The Better Angels of Machine Personality: How Personality Relates to LLM Safety"

10 / 10 papers shown
Title
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety
Jiahao Qiu
Yinghui He
Xinzhe Juan
Yuxiang Wang
Lingjuan Lyu
Zixin Yao
Yue Wu
Xun Jiang
L. Yang
Mengdi Wang
AI4MH
70
0
0
13 Apr 2025
Human-Centric Community Detection in Hybrid Metaverse Networks with Integrated AI Entities
Human-Centric Community Detection in Hybrid Metaverse Networks with Integrated AI Entities
Shih-Hsuan Chiu
Ya-Wen Teng
De-Nian Yang
Ming-Syan Chen
38
0
0
15 Feb 2025
SEER: Self-Explainability Enhancement of Large Language Models' Representations
SEER: Self-Explainability Enhancement of Large Language Models' Representations
Guanxu Chen
Dongrui Liu
Tao Luo
Jing Shao
LRM
MILM
67
1
0
07 Feb 2025
Programming Refusal with Conditional Activation Steering
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
K. Ramamurthy
Erik Miehling
Pierre L. Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
102
13
0
06 Sep 2024
EasyJailbreak: A Unified Framework for Jailbreaking Large Language
  Models
EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou
Xiao Wang
Limao Xiong
Han Xia
Yingshuang Gu
...
Lijun Li
Jing Shao
Tao Gui
Qi Zhang
Xuanjing Huang
75
32
0
18 Mar 2024
Illuminating the Black Box: A Psychometric Investigation into the
  Multifaceted Nature of Large Language Models
Illuminating the Black Box: A Psychometric Investigation into the Multifaceted Nature of Large Language Models
Yang Lu
Jordan Yu
Shou-Hsuan Stephen Huang
38
2
0
21 Dec 2023
Learning to Model Editing Processes
Learning to Model Editing Processes
Machel Reid
Graham Neubig
KELM
BDL
108
35
0
24 May 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Trustworthy AI: A Computational Perspective
Trustworthy AI: A Computational Perspective
Haochen Liu
Yiqi Wang
Wenqi Fan
Xiaorui Liu
Yaxin Li
Shaili Jain
Yunhao Liu
Anil K. Jain
Jiliang Tang
FaML
101
196
0
12 Jul 2021
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
280
1,595
0
18 Sep 2019
1