Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03109
Cited By
A Survey on Evaluation of Large Language Models
6 July 2023
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi
Cunxiang Wang
Yidong Wang
Weirong Ye
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Evaluation of Large Language Models"
50 / 171 papers shown
Title
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
51
5
0
24 May 2024
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu
Shengran Hu
Jeff Clune
LLMAG
47
10
0
24 May 2024
Evaluation of Retrieval-Augmented Generation: A Survey
Hao Yu
Aoran Gan
Kai Zhang
Shiwei Tong
Qi Liu
Zhaofeng Liu
3DV
62
82
0
13 May 2024
Traffic Performance GPT (TP-GPT): Real-Time Data Informed Intelligent ChatBot for Transportation Surveillance and Management
Bingzhang Wang
Zhiyu Cai
Muhammad Monjurul Karim
Chenxi Liu
Yinhai Wang
44
6
0
05 May 2024
NegativePrompt: Leveraging Psychology for Large Language Models Enhancement via Negative Emotional Stimuli
Xu Wang
Cheng-rong Li
Yi-Ju Chang
Jindong Wang
Yuan Wu
37
7
0
05 May 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhao Cao
3DV
84
46
0
23 Apr 2024
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
Denis Donadel
Francesco Marchiori
Luca Pajola
Mauro Conti
34
7
0
19 Apr 2024
Unveiling LLM Evaluation Focused on Metrics: Challenges and Solutions
Taojun Hu
Xiao-Hua Zhou
ELM
41
12
0
14 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
53
6
0
12 Apr 2024
Automating Research Synthesis with Domain-Specific Large Language Model Fine-Tuning
Teo Susnjak
Peter Hwang
N. Reyes
A. Barczak
Timothy R. McIntosh
Surangika Ranathunga
70
22
0
08 Apr 2024
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso
Martín Bertrán
Riccardo Fogliato
Aaron Roth
29
12
0
06 Apr 2024
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
71
51
0
02 Apr 2024
Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs
Xiaoze Liu
Feijie Wu
Tianyang Xu
Zhuo Chen
Yichi Zhang
Xiaoqian Wang
Jing Gao
HILM
42
8
0
01 Apr 2024
GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education
Mike Perkins
Jasper Roe
Binh H. Vu
Darius Postma
Don Hickerson
James McGaughran
Huy Q. Khuat British University Vietnam
DeLMO
43
19
0
28 Mar 2024
Content Knowledge Identification with Multi-Agent Large Language Models (LLMs)
Kaiqi Yang
Yucheng Chu
Taylor Darwin
Ahreum Han
Hang Li
Hongzhi Wen
Yasemin Copur-Gencturk
Jiliang Tang
Hui Liu
32
11
0
22 Mar 2024
Large Language Models for Blockchain Security: A Systematic Literature Review
Zheyuan He
Zihao Li
Sen Yang
Ao Qiao
Xiaosong Zhang
Xiapu Luo
Ting Chen
Ting Chen
PILM
42
14
0
21 Mar 2024
GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture
Shanglong Yang
Zhipeng Yuan
Shunbao Li
Ruoling Peng
Kang Liu
Po Yang
ELM
LM&MA
45
6
0
18 Mar 2024
Evaluating LLMs for Gender Disparities in Notable Persons
L. Rhue
Sofie Goethals
Arun Sundararajan
52
4
0
14 Mar 2024
Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency
Akila Wickramasekara
F. Breitinger
Mark Scanlon
52
8
0
29 Feb 2024
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey
Xi Fang
Weijie Xu
Fiona Anting Tan
Jiani Zhang
Ziqing Hu
Yanjun Qi
Scott Nickleach
Diego Socolinsky
Srinivasan H. Sengamedu
Christos Faloutsos
LMTD
ALM
37
66
0
27 Feb 2024
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
29
5
0
27 Feb 2024
COMPASS: Computational Mapping of Patient-Therapist Alliance Strategies with Language Modeling
Baihan Lin
Djallel Bouneffouf
Yulia Landa
Rachel Jespersen
Cheryl Corcoran
Guillermo Cecchi
51
1
0
22 Feb 2024
Large Language Models for Stemming: Promises, Pitfalls and Failures
Shuai Wang
Shengyao Zhuang
Guido Zuccon
39
1
0
19 Feb 2024
API Pack: A Massive Multi-Programming Language Dataset for API Call Generation
Zhen Guo
Adriana Meza Soria
Wei Sun
Yikang Shen
Rameswar Panda
ELM
ALM
55
1
0
14 Feb 2024
Utilizing Large LanguageModels to Detect Privacy Leaks in Mini-App Code
Liming Jiang
29
1
0
12 Feb 2024
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
371
0
09 Feb 2024
Enhancing Zero-shot Counting via Language-guided Exemplar Learning
Mingjie Wang
Jun Zhou
Yong Dai
Eric Buys
Minglun Gong
38
0
0
08 Feb 2024
Adversarial Text Purification: A Large Language Model Approach for Defense
Raha Moraffah
Shubh Khandelwal
Amrita Bhattacharjee
Huan Liu
DeLMO
AAML
36
5
0
05 Feb 2024
The RL/LLM Taxonomy Tree: Reviewing Synergies Between Reinforcement Learning and Large Language Models
M. Pternea
Prerna Singh
Abir Chakraborty
Y. Oruganti
M. Milletarí
Sayli Bapat
Kebei Jiang
OffRL
24
7
0
02 Feb 2024
Mathematical Algorithm Design for Deep Learning under Societal and Judicial Constraints: The Algorithmic Transparency Requirement
Holger Boche
Adalbert Fono
Gitta Kutyniok
FaML
31
4
0
18 Jan 2024
AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models
Dong Shu
Mingyu Jin
Suiyuan Zhu
Beichen Wang
Zihao Zhou
Chong Zhang
Yongfeng Zhang
ELM
47
12
0
17 Jan 2024
Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Planning Case Study
Shangding Gu
LLMAG
43
0
0
12 Jan 2024
Exploiting Novel GPT-4 APIs
Kellin Pelrine
Mohammad Taufeeque
Michal Zajkac
Euan McLean
Adam Gleave
SILM
23
20
0
21 Dec 2023
NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Ran Zhang
Aida Kostikova
Christoph Leiter
Jonas Belouadi
Daniil Larionov
Yanran Chen
Vivian Fresen
Steffen Eger
39
0
0
09 Dec 2023
Enhancing Robustness of Foundation Model Representations under Provenance-related Distribution Shifts
Xiruo Ding
Zhecheng Sheng
Brian Hur
Feng Chen
Serguei V. S. Pakhomov
Trevor Cohen
OOD
20
0
0
09 Dec 2023
Large Language Models Are Zero-Shot Text Classifiers
Zhiqiang Wang
Yiran Pang
Yanbin Lin
13
29
0
02 Dec 2023
Large Language Models as Topological Structure Enhancers for Text-Attributed Graphs
Shengyin Sun
Yuxiang Ren
Chen Ma
Xuecang Zhang
113
20
0
24 Nov 2023
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Zhuosheng Zhang
Yao Yao
Aston Zhang
Xiangru Tang
Xinbei Ma
...
Yiming Wang
Mark B. Gerstein
Rui Wang
Gongshen Liu
Hai Zhao
LLMAG
LM&Ro
LRM
42
53
0
20 Nov 2023
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples
Shuo Yang
Wei-Lin Chiang
Lianmin Zheng
Joseph E. Gonzalez
Ion Stoica
ALM
27
110
0
08 Nov 2023
Evaluating General-Purpose AI with Psychometrics
Xiting Wang
Liming Jiang
Jose Hernandez-Orallo
David Stillwell
Luning Sun
Fang Luo
Xing Xie
AI4MH
ELM
30
12
0
25 Oct 2023
BC4LLM: Trusted Artificial Intelligence When Blockchain Meets Large Language Models
Haoxiang Luo
Jian Luo
Athanasios V. Vasilakos
34
9
0
10 Oct 2023
SCAR: Power Side-Channel Analysis at RTL-Level
Amisha Srivastava
Sanjay Das
Navnil Choudhury
Rafail Psiakis
Pedro Henrique Silva
Debjit Pal
Kanad Basu
29
8
0
10 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
43
20
0
01 Oct 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
40
66
0
21 Sep 2023
Are Large Language Models Really Robust to Word-Level Perturbations?
Haoyu Wang
Guozheng Ma
Cong Yu
Ning Gui
Linrui Zhang
...
Sen Zhang
Li Shen
Xueqian Wang
Peilin Zhao
Dacheng Tao
KELM
26
22
0
20 Sep 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
133
141
0
19 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
98
52
0
17 Sep 2023
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li
Rui Li
Qi Liu
31
15
0
08 Sep 2023
FLM-101B: An Open LLM and How to Train It with
100
K
B
u
d
g
e
t
100K Budget
100
K
B
u
d
g
e
t
Xiang Li
Yiqun Yao
Xin Jiang
Xuezhi Fang
Xuying Meng
...
LI DU
Bowen Qin
Zheng-Wei Zhang
Aixin Sun
Yequan Wang
60
21
0
07 Sep 2023
Supervised Learning and Large Language Model Benchmarks on Mental Health Datasets: Cognitive Distortions and Suicidal Risks in Chinese Social Media
Hongzhi Qi
Qing Zhao
Jianqiang Li
Changwei Song
Wei-dong Zhai
...
Y. Yu
Fan Wang
Huijing Zou
Bing Xiang Yang
Guanghui Fu
AI4MH
26
12
0
07 Sep 2023
Previous
1
2
3
4
Next