Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.18551
Cited By
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
24 December 2024
Haoyang Li
Xudong Han
Zenan Zhai
Honglin Mu
Hao Wang
Zhenxuan Zhang
Yilin Geng
Shom Lin
R. Wang
Artem Shelmanov
Xiangyu Qi
Yanjie Wang
Chongye Guo
Youliang Yuan
Meng Chen
Haoqin Tu
Fajri Koto
Tatsuki Kuribayashi
Cong Zeng
Rishabh Bhardwaj
Bingchen Zhao
Yawen Duan
Yixiao Liu
Emad A. Alghamdi
Yue Yang
Yinpeng Dong
Soujanya Poria
Pengfei Liu
Zhengzhong Liu
Xuguang Ren
Eduard H. Hovy
Iryna Gurevych
Preslav Nakov
Monojit Choudhury
Timothy Baldwin
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability"
10 / 10 papers shown
Title
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
...
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
96
79
0
02 Nov 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
351
244
0
20 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
229
163
0
16 Oct 2023
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
Yufan Chen
Arjun Arunasalam
Z. Berkay Celik
71
38
0
03 Oct 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
119
430
0
20 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
441
4,444
0
09 Jun 2023
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
149
200
0
15 Dec 2022
Are Large Pre-Trained Language Models Leaking Your Personal Information?
Jie Huang
Hanyin Shao
Kevin Chen-Chuan Chang
PILM
98
201
0
25 May 2022
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
122
1,044
0
08 Dec 2021
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Emily Dinan
Samuel Humeau
Bharath Chintagunta
Jason Weston
88
248
0
17 Aug 2019
1