Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.10950
Cited By
E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models
16 June 2024
Zhenyu Zhang
Bingguang Hao
Jinpeng Li
Zekai Zhang
Dongyan Zhao
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models"
13 / 13 papers shown
Title
SUGARCREPE++ Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations
Sri Harsha Dumpala
Aman Jaiswal
Chandramouli Shama Sastry
E. Milios
Sageev Oore
Hassan Sajjad
CoGe
83
12
0
17 Jun 2024
How are Prompts Different in Terms of Sensitivity?
Sheng Lu
Hendrik Schuff
Iryna Gurevych
71
19
0
13 Nov 2023
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Wei Ping
Weixin Chen
Hengzhi Pei
Chulin Xie
Mintong Kang
...
Zinan Lin
Yuk-Kit Cheng
Sanmi Koyejo
Basel Alomair
Yue Liu
119
430
0
20 Jun 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,748
0
15 Mar 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
139
236
0
22 Feb 2023
GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective
Linyi Yang
Shuibai Zhang
Libo Qin
Yafu Li
Yidong Wang
Hanmeng Liu
Jindong Wang
Xingxu Xie
Yue Zhang
ELM
118
82
0
15 Nov 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
888
13,207
0
04 Mar 2022
Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models
Wei Ping
Chejian Xu
Shuohang Wang
Zhe Gan
Yu Cheng
Jianfeng Gao
Ahmed Hassan Awadallah
Yangqiu Song
VLM
ELM
AAML
72
225
0
04 Nov 2021
RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems
Baolin Peng
Chunyuan Li
Zhu Zhang
Chenguang Zhu
Jinchao Li
Jianfeng Gao
61
50
0
29 Dec 2020
OpenAttack: An Open-source Textual Adversarial Attack Toolkit
Guoyang Zeng
Fanchao Qi
Qianrui Zhou
Ting Zhang
Zixian Ma
Bairu Hou
Yuan Zang
Zhiyuan Liu
Maosong Sun
AAML
174
124
0
19 Sep 2020
TextBugger: Generating Adversarial Text Against Real-world Applications
Jinfeng Li
S. Ji
Tianyu Du
Bo Li
Ting Wang
SILM
AAML
216
747
0
13 Dec 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
Generating Natural Adversarial Examples
Zhengli Zhao
Dheeru Dua
Sameer Singh
GAN
AAML
186
601
0
31 Oct 2017
1