ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.05374
  4. Cited By
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language
  Models' Alignment

Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment

10 August 2023
Yang Liu
Yuanshun Yao
Jean-François Ton
Xiaoying Zhang
Ruocheng Guo
Hao Cheng
Yegor Klochkov
Muhammad Faaiz Taufiq
Hanguang Li
    ALM
ArXivPDFHTML

Papers citing "Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment"

29 / 29 papers shown
Title
SafeVid: Toward Safety Aligned Video Large Multimodal Models
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang
Jiaxin Song
Yifeng Gao
Xin Wang
Yang Yao
Yan Teng
Xingjun Ma
Yingchun Wang
Yu-Gang Jiang
113
0
0
17 May 2025
Large Language Models Could Be Rote Learners
Large Language Models Could Be Rote Learners
Yuyang Xu
Renjun Hu
Haochao Ying
Jian Wu
Xing Shi
Wei Lin
ELM
348
0
0
11 Apr 2025
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
192
30
0
17 Mar 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo
Wei Xu
Alan Ritter
68
3
0
12 Mar 2025
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
Beyond Surface-Level Patterns: An Essence-Driven Defense Framework Against Jailbreak Attacks in LLMs
Shiyu Xiang
Ansen Zhang
Yanfei Cao
Yang Fan
Ronghao Chen
AAML
88
2
0
26 Feb 2025
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang
Shenzhe Zhu
Zeyu Wu
Keyu Wang
Junchi Yao
Junchao Wu
Lijie Hu
Mengdi Li
Derek F. Wong
Di Wang
43
8
0
18 Feb 2025
MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
Yuqin Dai
Zhouheng Yao
Chunfeng Song
Qihao Zheng
Weijian Mai
Kunyu Peng
Shuai Lu
Wanli Ouyang
Jian Yang
Jiamin Wu
373
2
0
07 Feb 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
324
2
0
31 Jan 2025
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models
Di Jin
Xing Liu
Yu Liu
Jia Qing Yap
Andrea Wong
Adriana Crespo
Qi Lin
Zhiyuan Yin
Qiang Yan
Ryan Ye
EGVM
VLM
409
0
0
10 Jan 2025
LLM-Virus: Evolutionary Jailbreak Attack on Large Language Models
Miao Yu
Sihang Li
Yingjie Zhou
Xing Fan
Kun Wang
Shirui Pan
Qingsong Wen
AAML
102
1
0
03 Jan 2025
ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning
ValuesRAG: Enhancing Cultural Alignment Through Retrieval-Augmented Contextual Learning
Wonduk Seo
Hyunjin An
Yi Bu
VLM
112
2
0
02 Jan 2025
REFA: Reference Free Alignment for multi-preference optimization
REFA: Reference Free Alignment for multi-preference optimization
Taneesh Gupta
Rahul Madhavan
Xuchao Zhang
Chetan Bansal
Saravan Rajmohan
125
1
0
20 Dec 2024
JailbreakLens: Interpreting Jailbreak Mechanism in the Lens of Representation and Circuit
Zeqing He
Peng Kuang
Zhixuan Chu
Huiyu Xu
Rui Zheng
Kui Ren
Chun Chen
97
7
0
17 Nov 2024
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
130
1
0
12 Nov 2024
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
MEG: Medical Knowledge-Augmented Large Language Models for Question Answering
Laura Cabello
Carmen Martin-Turrero
Uchenna Akujuobi
Anders Søgaard
Carlos Bobed
AI4MH
251
1
0
06 Nov 2024
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Are LLM-Judges Robust to Expressions of Uncertainty? Investigating the effect of Epistemic Markers on LLM-based Evaluation
Dongryeol Lee
Yerin Hwang
Yongil Kim
Joonsuk Park
Kyomin Jung
ELM
112
10
0
28 Oct 2024
Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
Towards Unifying Evaluation of Counterfactual Explanations: Leveraging Large Language Models for Human-Centric Assessments
M. Domnich
Julius Valja
Rasmus Moorits Veski
Giacomo Magnifico
Kadi Tulver
Eduard Barbu
Raul Vicente
LRM
ELM
69
4
0
28 Oct 2024
Preference Optimization with Multi-Sample Comparisons
Preference Optimization with Multi-Sample Comparisons
Chaoqi Wang
Zhuokai Zhao
Chen Zhu
Karthik Abinav Sankararaman
Michal Valko
...
Zhaorun Chen
Madian Khabsa
Yuxin Chen
Hao Ma
Sinong Wang
105
10
0
16 Oct 2024
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond
Shanshan Han
139
1
0
09 Oct 2024
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu
Liwei Jiang
Yejin Choi
99
7
0
03 Oct 2024
Causal Inference with Large Language Model: A Survey
Causal Inference with Large Language Model: A Survey
Jing Ma
CML
LRM
199
9
0
15 Sep 2024
PersLLM: A Personified Training Approach for Large Language Models
PersLLM: A Personified Training Approach for Large Language Models
Zheni Zeng
Jiayi Chen
Haotian Chen
Yukun Yan
Yuxuan Chen
Zhenghao Liu
Zhiyuan Liu
Maosong Sun
LLMAG
68
2
0
17 Jul 2024
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Teaching LLMs to Abstain across Languages via Multilingual Feedback
Shangbin Feng
Weijia Shi
Yike Wang
Wenxuan Ding
Orevaoghene Ahia
Shuyue Stella Li
Vidhisha Balachandran
Sunayana Sitaram
Yulia Tsvetkov
104
7
0
22 Jun 2024
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference
Anton Xue
Avishree Khare
Rajeev Alur
Surbhi Goel
Eric Wong
94
2
0
21 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELM
ALM
124
8
0
20 Jun 2024
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Yipeng Zhang
Haitao Mi
Helen Meng
CLL
KELM
105
7
0
10 Jun 2024
High-Dimension Human Value Representation in Large Language Models
High-Dimension Human Value Representation in Large Language Models
Samuel Cahyawijaya
Delong Chen
Yejin Bang
Leila Khalatbari
Bryan Wilie
Ziwei Ji
Etsuko Ishii
Pascale Fung
146
6
0
11 Apr 2024
The Impact of Unstated Norms in Bias Analysis of Language Models
The Impact of Unstated Norms in Bias Analysis of Language Models
Farnaz Kohankhaki
D. B. Emerson
David B. Emerson
Laleh Seyyed-Kalantari
Faiza Khan Khattak
73
1
0
04 Apr 2024
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
LLMs with Industrial Lens: Deciphering the Challenges and Prospects -- A Survey
Ashok Urlana
Charaka Vinayak Kumar
Ajeet Kumar Singh
B. Garlapati
S. Chalamala
Rahul Mishra
82
7
0
22 Feb 2024
1