Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2307.03109
Cited By
A Survey on Evaluation of Large Language Models
6 July 2023
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
Kaijie Zhu
Hao Chen
Xiaoyuan Yi
Cunxiang Wang
Yidong Wang
Weirong Ye
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey on Evaluation of Large Language Models"
50 / 160 papers shown
Title
VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation
Chaofan Zhang
Peng Hao
Xiaoge Cao
Xiaoshuai Hao
Shaowei Cui
Shuo Wang
29
0
0
14 May 2025
DanceGRPO: Unleashing GRPO on Visual Generation
Zeyue Xue
Jie Wu
Yu Gao
Fangyuan Kong
Lingting Zhu
...
Zhiheng Liu
Wei Liu
Qiushan Guo
Weilin Huang
Ping Luo
EGVM
VGen
52
0
0
12 May 2025
Semantic Retention and Extreme Compression in LLMs: Can We Have Both?
Stanislas Laborde
Martin Cousseau
Antoun Yaacoub
Lionel Prevost
MQ
23
0
0
12 May 2025
Large Language Models and Arabic Content: A Review
Haneh Rhel
Dmitri Roussinov
29
0
0
12 May 2025
WaterDrum: Watermarking for Data-centric Unlearning Metric
Xinyang Lu
Xinyuan Niu
Gregory Kang Ruey Lau
Bui Thi Cam Nhung
Rachael Hwee Ling Sim
Fanyu Wen
Chuan-Sheng Foo
S. Ng
Bryan Kian Hsiang Low
MU
59
0
0
08 May 2025
GenAI in Entrepreneurship: a systematic review of generative artificial intelligence in entrepreneurship research: current issues and future directions
Anna Kusetogullari
Huseyin Kusetogullari
Martin Andersson
Tony Gorschek
32
0
0
08 May 2025
Fine-Tuning Large Language Models and Evaluating Retrieval Methods for Improved Question Answering on Building Codes
Mohammad Aqib
Mohd Hamza
Qipei Mei
Ying Hei Chui
RALM
ELM
52
0
0
07 May 2025
CDE-Mapper: Using Retrieval-Augmented Language Models for Linking Clinical Data Elements to Controlled Vocabularies
Komal Gilani
Marlo Verket
Christof Peters
Michel Dumontier
Hans-Peter Brunner-La Rocca
V. Urovi
35
0
0
07 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
X. Zhang
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
74
0
0
05 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
52
0
0
03 May 2025
Efficient Fine-Tuning of Quantized Models via Adaptive Rank and Bitwidth
Changhai Zhou
Yuhua Zhou
Qian Qiao
Weizhong Zhang
Cheng Jin
MQ
27
0
0
02 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
112
0
0
02 May 2025
UserCentrix: An Agentic Memory-augmented AI Framework for Smart Spaces
Alaa Saleh
Sasu Tarkoma
Praveen Kumar Donta
Naser Hossein Motlagh
Schahram Dustdar
Susanna Pirttikangas
Lauri Lovén
48
0
0
01 May 2025
Robotic Visual Instruction
Y. Li
Ziyang Gong
H. Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
76
0
0
01 May 2025
An Automated Reinforcement Learning Reward Design Framework with Large Language Model for Cooperative Platoon Coordination
Dixiao Wei
Peng Yi
Jinlong Lei
Yiguang Hong
Yuchuan Du
128
0
0
28 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
S.
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
92
0
0
28 Apr 2025
Comparing Uncertainty Measurement and Mitigation Methods for Large Language Models: A Systematic Review
Toghrul Abbasli
Kentaroh Toyoda
Yuan Wang
Leon Witt
Muhammad Asif Ali
Yukai Miao
Dan Li
Qingsong Wei
UQCV
92
0
0
25 Apr 2025
Think2SQL: Reinforce LLM Reasoning Capabilities for Text2SQL
Simone Papicchio
Simone Rossi
Luca Cagliero
Paolo Papotti
ReLM
LMTD
AI4TS
LRM
53
0
0
21 Apr 2025
CPG-EVAL: A Multi-Tiered Benchmark for Evaluating the Chinese Pedagogical Grammar Competence of Large Language Models
Dong Wang
ELM
28
0
0
17 Apr 2025
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Joseph Liu
Yoonsoo Nam
Xinyue Cui
Swabha Swayamdipta
53
0
0
13 Apr 2025
DeepGreen: Effective LLM-Driven Green-washing Monitoring System Designed for Empirical Testing -- Evidence from China
Congluo Xu
Yu Miao
Yiling Xiao
Chengmengjia Lin
24
0
0
10 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu
Arka Dutta
Sean Kim
Ashiqur R. KhudaBukhsh
Munmun De Choudhury
19
0
0
08 Apr 2025
Exploring Generative AI Techniques in Government: A Case Study
Sunyi Liu
Mengzhe Geng
Rebecca Hart
LLMAG
36
0
0
06 Apr 2025
Engineering Artificial Intelligence: Framework, Challenges, and Future Direction
Jay Lee
Hanqi Su
Dai-Yan Ji
Takanobu Minami
AI4CE
48
0
0
03 Apr 2025
Rethinking industrial artificial intelligence: a unified foundation framework
Jay Lee
Hanqi Su
AI4CE
41
1
0
02 Apr 2025
Communication-Efficient and Personalized Federated Foundation Model Fine-Tuning via Tri-Matrix Adaptation
Y. Li
Bo Liu
Sheng Huang
Z. Zhang
Xiaotong Yuan
Richang Hong
46
0
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
56
2
0
30 Mar 2025
An evaluation of LLMs and Google Translate for translation of selected Indian languages via sentiment and semantic analyses
Rohitash Chandra
Aryan Chaudhary
Yeshwanth Rayavarapu
44
0
0
27 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Y. Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
133
0
0
08 Mar 2025
The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats
William Brach
Kristián Košťál
Michal Ries
180
0
0
04 Mar 2025
Towards Effective and Efficient Context-aware Nucleus Detection in Histopathology Whole Slide Images
Zhongyi Shui
Ruizhe Guo
Honglin Li
Yuxuan Sun
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Pingyi Chen
Yanzhou Su
Lin Yang
46
0
0
04 Mar 2025
Mapping Trustworthiness in Large Language Models: A Bibliometric Analysis Bridging Theory to Practice
José Antonio Siqueira de Cerqueira
Kai-Kristian Kemell
Muhammad Waseem
Rebekah A. Rousi
Nannan Xi
Juho Hamari
99
3
0
27 Feb 2025
Generative Artificial Intelligence: Evolving Technology, Growing Societal Impact, and Opportunities for Information Systems Research
Veda C. Storey
Wei Thoo Yue
J. Leon Zhao
Roman Lukyanenko
43
0
0
25 Feb 2025
Optimizing Estimators of Squared Calibration Errors in Classification
Sebastian G. Gruber
Francis Bach
71
1
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
69
8
0
24 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
81
5
0
20 Feb 2025
Forget the Data and Fine-Tuning! Just Fold the Network to Compress
Dong Wang
Haris Šikić
Lothar Thiele
O. Saukh
48
0
0
17 Feb 2025
The Representation and Recall of Interwoven Structured Knowledge in LLMs: A Geometric and Layered Analysis
Ge Lei
Samuel J. Cooper
KELM
47
0
0
15 Feb 2025
Efficient Evaluation of Multi-Task Robot Policies With Active Experiment Selection
Abrar Anwar
Rohan Gupta
Zain Merchant
Sayan Ghosh
Willie Neiswanger
Jesse Thomason
OffRL
67
1
0
14 Feb 2025
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding
Mo Yu
Lemao Liu
J. Wu
Tsz Ting Chung
Shunchi Zhang
JiangNan Li
Dit-Yan Yeung
Jie Zhou
85
1
0
13 Feb 2025
APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models
Sidahmed Benabderrahmane
Petko Valtchev
James Cheney
Talal Rahwan
68
0
0
13 Feb 2025
Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models
Tabinda Sarwar
Antonio Jose Jimeno Yepes
Lawrence Cavedon
65
0
0
12 Feb 2025
Rank Also Matters: Hierarchical Configuration for Mixture of Adapter Experts in LLM Fine-Tuning
Peizhuang Cong
Wenpu Liu
Wenhan Yu
Haochen Zhao
Tong Yang
ALM
MoE
78
0
0
06 Feb 2025
Large Language Models as Common-Sense Heuristics
Andrey Borro
Patricia J. Riddle
Michael W Barley
Michael Witbrock
LRM
LM&Ro
136
1
0
31 Jan 2025
Explaining Decisions of Agents in Mixed-Motive Games
Maayan Orner
Oleg Maksimov
Akiva Kleinerman
Charles Ortiz
Sarit Kraus
90
0
0
28 Jan 2025
Turbulence: Systematically and Automatically Testing Instruction-Tuned Large Language Models for Code
Shahin Honarvar
Mark van der Wilk
Alastair Donaldson
78
6
0
28 Jan 2025
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
H. Zhang
Xiaoman Pan
Hongwei Wang
Kaixin Ma
W. Yu
Dong Yu
LLMAG
61
3
0
03 Jan 2025
Simulating Human-like Daily Activities with Desire-driven Autonomy
Yiding Wang
Yuxuan Chen
Fangwei Zhong
Long Ma
Yizhou Wang
72
2
0
09 Dec 2024
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Sizhe Xing
Aolong Sun
Chengxi Wang
Yizhi Wang
Boyu Dong
...
Xi Xiao
R. Penty
Qixiang Cheng
Nan Chi
Junwen Zhang
111
0
0
04 Dec 2024
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen
Chengyu Wang
Dakan Wang
Taolin Zhang
Wangyue Li
Xiaofeng He
KELM
80
1
0
23 Nov 2024
1
2
3
4
Next