ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.18551
  4. Cited By
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard
  of Safety and Capability

Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability

24 December 2024
Haoyang Li
Xudong Han
Zenan Zhai
Honglin Mu
Hao Wang
Zhenxuan Zhang
Yilin Geng
Shom Lin
R. Wang
Artem Shelmanov
Xiangyu Qi
Yanjie Wang
Chongye Guo
Youliang Yuan
Meng Chen
Haoqin Tu
Fajri Koto
Tatsuki Kuribayashi
Cong Zeng
Rishabh Bhardwaj
Bingchen Zhao
Yawen Duan
Yixiao Liu
Emad A. Alghamdi
Yue Yang
Yinpeng Dong
Soujanya Poria
Pengfei Liu
Zhengzhong Liu
Xuguang Ren
Eduard H. Hovy
Iryna Gurevych
Preslav Nakov
Monojit Choudhury
Timothy Baldwin
    ALM
ArXiv (abs)PDFHTML

Papers citing "Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability"

10 / 10 papers shown
Title
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer
Olivia Watkins
Ethan Mendes
Justin Svegliato
Luke Bailey
...
Karim Elmaaroufi
Pieter Abbeel
Trevor Darrell
Alan Ritter
Stuart J. Russell
96
79
0
02 Nov 2023
Towards Understanding Sycophancy in Language Models
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
351
244
0
20 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by
  Adversarial Attacks
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
229
163
0
16 Oct 2023
Can Large Language Models Provide Security & Privacy Advice? Measuring
  the Ability of LLMs to Refute Misconceptions
Can Large Language Models Provide Security & Privacy Advice? Measuring the Ability of LLMs to Refute Misconceptions
Yufan Chen
Arjun Arunasalam
Z. Berkay Celik
71
38
0
03 Oct 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT
  Models
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
119
430
0
20 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
441
4,444
0
09 Jun 2023
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
  Zero-Shot Reasoning
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLMLRM
149
200
0
15 Dec 2022
Are Large Pre-Trained Language Models Leaking Your Personal Information?
Are Large Pre-Trained Language Models Leaking Your Personal Information?
Jie Huang
Hanyin Shao
Kevin Chen-Chuan Chang
PILM
98
201
0
25 May 2022
Ethical and social risks of harm from Language Models
Ethical and social risks of harm from Language Models
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
...
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
122
1,044
0
08 Dec 2021
Build it Break it Fix it for Dialogue Safety: Robustness from
  Adversarial Human Attack
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack
Emily Dinan
Samuel Humeau
Bharath Chintagunta
Jason Weston
88
248
0
17 Aug 2019
1