Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03109
Cited By
A Survey on Evaluation of Large Language Models
6 July 2023
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi
Cunxiang Wang
Yidong Wang
Weirong Ye
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Evaluation of Large Language Models"
50 / 169 papers shown
Title
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Sizhe Xing
Aolong Sun
Chengxi Wang
Yizhi Wang
Boyu Dong
...
Xi Xiao
R. Penty
Qixiang Cheng
Nan Chi
Junwen Zhang
113
0
0
04 Dec 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
80
1
0
23 Nov 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Zeroth-Order Adaptive Neuron Alignment Based Pruning without Re-Training
Elia Cunegatti
Leonardo Lucio Custode
Giovanni Iacca
49
0
0
11 Nov 2024
A Deep Dive Into Large Language Model Code Generation Mistakes: What and Why?
QiHong Chen
Jiawei Li
Jiecheng Deng
Jiachen Yu
Justin Tian Jin Chen
Iftekhar Ahmed
56
0
0
03 Nov 2024
Interacting Large Language Model Agents. Interpretable Models and Social Learning
Adit Jain
Vikram Krishnamurthy
LLMAG
37
0
0
02 Nov 2024
PViT: Prior-augmented Vision Transformer for Out-of-distribution Detection
Tianhao Zhang
Zhixiang Chen
Lyudmila Mihaylova
134
0
0
27 Oct 2024
Can We Trust AI Agents? A Case Study of an LLM-Based Multi-Agent System for Ethical AI
José Antonio Siqueira de Cerqueira
Mamia Agbese
Rebekah Rousi
Nannan Xi
Juho Hamari
Pekka Abrahamsson
LLMAG
41
4
0
25 Oct 2024
ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models
H. Zhang
Hongfu Gao
Qiang Hu
Guanhua Chen
L. Yang
Bingyi Jing
Hongxin Wei
Bing Wang
Haifeng Bai
Lei Yang
AILaw
ELM
49
2
0
24 Oct 2024
FIOVA: A Multi-Annotator Benchmark for Human-Aligned Video Captioning
Shiyu Hu
Xuchen Li
Xuzhao Li
Jing Zhang
Yipei Wang
Xin Zhao
Kang Hao Cheong
VLM
26
1
0
20 Oct 2024
Unveiling Large Language Models Generated Texts: A Multi-Level Fine-Grained Detection Framework
Zhen Tao
Zhiyu Li
Runyu Chen
Dinghao Xi
Wei Xu
DeLMO
26
1
0
18 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
47
1
0
15 Oct 2024
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs
Ruijia Niu
D. Wu
Rose Yu
Yi Ma
33
1
0
09 Oct 2024
Mixture Compressor for Mixture-of-Experts LLMs Gains More
Wei Huang
Yue Liao
Jianhui Liu
Ruifei He
Haoru Tan
Shiming Zhang
Hongsheng Li
Si Liu
Xiaojuan Qi
MoE
39
3
0
08 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
31
4
0
07 Oct 2024
Undesirable Memorization in Large Language Models: A Survey
Ali Satvaty
Suzan Verberne
Fatih Turkmen
ELM
PILM
74
7
0
03 Oct 2024
A Survey on Point-of-Interest Recommendation: Models, Architectures, and Security
Qianru Zhang
Peng Yang
Junliang Yu
Haixin Wang
Xingwei He
S. Yiu
Hongzhi Yin
41
1
0
03 Oct 2024
Knowledge-Driven Feature Selection and Engineering for Genotype Data with Large Language Models
Joseph Lee
Shu Yang
Jae Young Baik
Xiaoxi Liu
Zhen Tan
...
Zixuan Wen
Bojian Hou
D. Duong-Tran
Tianlong Chen
Li Shen
44
1
0
02 Oct 2024
Reasoning Elicitation in Language Models via Counterfactual Feedback
Alihan Hüyük
Xinnuo Xu
Jacqueline Maasch
Aditya V. Nori
Javier González
ReLM
LRM
151
1
0
02 Oct 2024
Assessment and manipulation of latent constructs in pre-trained language models using psychometric scales
Maor Reuben
Ortal Slobodin
Aviad Elyshar
Idan-Chaim Cohen
Orna Braun-Lewensohn
Odeya Cohen
Rami Puzis
38
0
0
29 Sep 2024
Recent Advances in OOD Detection: Problems and Approaches
Shuo Lu
YingSheng Wang
Lijun Sheng
Aihua Zheng
Lingxiao He
Jian Liang
OODD
68
3
0
18 Sep 2024
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye
Yuhang Xie
Yuanyi Ren
Hanjun Fang
Xin Zhang
Guojie Song
LM&MA
37
1
0
18 Sep 2024
Recent advances in deep learning and language models for studying the microbiome
Binghao Yan
Yunbi Nam
Lingyao Li
Rebecca A Deek
Hongzhe Li
Siyuan Ma
23
1
0
15 Sep 2024
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text
Michael Burnham
Kayla Kahn
Ryan Yank Wang
Rachel X. Peng
34
5
0
03 Sep 2024
Exploring the Feasibility of Automated Data Standardization using Large Language Models for Seamless Positioning
M. Lee
Ju Lin
Li-Ta Hsu
18
0
0
22 Aug 2024
Target Prompting for Information Extraction with Vision Language Model
Dipankar Medhi
VLM
40
0
0
07 Aug 2024
TrustNavGPT: Modeling Uncertainty to Improve Trustworthiness of Audio-Guided LLM-Based Robot Navigation
Xingpeng Sun
Yiran Zhang
Xindi Tang
Amrit Singh Bedi
Aniket Bera
47
4
0
03 Aug 2024
Speech-Guided Sequential Planning for Autonomous Navigation using Large Language Model Meta AI 3 (Llama3)
Alkesh K. Srivastava
Philip Dames
LLMAG
LM&Ro
48
1
0
13 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
57
5
0
11 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
58
1
0
08 Jul 2024
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
39
9
0
05 Jul 2024
On the Workflows and Smells of Leaderboard Operations (LBOps): An Exploratory Study of Foundation Model Leaderboards
Zhimin Zhao
A. A. Bangash
F. Côgo
Bram Adams
Ahmed E. Hassan
59
1
0
04 Jul 2024
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Song Wang
Peng Wang
Tong Zhou
Yushun Dong
Zhen Tan
Jundong Li
CoGe
56
7
0
02 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
LLM2FEA: Discover Novel Designs with Generative Evolutionary Multitasking
Melvin Wong
Jiao Liu
Thiago Rios
Stefan Menzel
Yew-Soon Ong
55
2
0
21 Jun 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
61
55
0
18 Jun 2024
MoE-RBench
\texttt{MoE-RBench}
MoE-RBench
: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
76
5
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
85
24
0
17 Jun 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
68
3
0
17 Jun 2024
Ontology Embedding: A Survey of Methods, Applications and Resources
Jiaoyan Chen
Olga Mashkova
Fernando Zhapa-Camacho
R. Hoehndorf
Yuan He
Ian Horrocks
50
4
0
16 Jun 2024
A Survey on Large Language Models from General Purpose to Medical Applications: Datasets, Methodologies, and Evaluations
Jinqiang Wang
Huansheng Ning
Yi Peng
Qikai Wei
Daniel Tesfai
Wenwei Mao
Tao Zhu
Runhe Huang
LM&MA
AI4MH
ELM
44
5
0
14 Jun 2024
Adversarial Evasion Attack Efficiency against Large Language Models
João Vitorino
Eva Maia
Isabel Praça
AAML
43
2
0
12 Jun 2024
Are Large Language Models Good Statisticians?
Yizhang Zhu
Shiyin Du
Boyan Li
Yuyu Luo
Nan Tang
ELM
40
15
0
12 Jun 2024
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Xiaoshuai Song
Muxi Diao
Guanting Dong
Zhengyang Wang
Yujia Fu
...
Yejie Wang
Zhuoma Gongque
Jianing Yu
Qiuna Tan
Weiran Xu
ELM
55
11
0
12 Jun 2024
Cycles of Thought: Measuring LLM Confidence through Stable Explanations
Evan Becker
Stefano Soatto
45
6
0
05 Jun 2024
Large Language Models as Evaluators for Recommendation Explanations
Xiaoyu Zhang
Yishan Li
Jiayin Wang
Bowen Sun
Weizhi Ma
Peijie Sun
Min Zhang
LRM
ELM
48
12
0
05 Jun 2024
How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors?
Subhankar Maity
Aniket Deroy
Sudeshna Sarkar
29
10
0
27 May 2024
A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence
Ali Kashefi
VGen
48
5
0
24 May 2024
Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models
Cong Lu
Shengran Hu
Jeff Clune
LLMAG
47
10
0
24 May 2024
Previous
1
2
3
4
Next