Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04757
Cited By
v1
v2
v3 (latest)
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
7 June 2023
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (546★)
Papers citing
"INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models"
48 / 48 papers shown
Title
Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases
Derian Boer
Stephen Roth
Stefan Kramer
KELM
82
0
0
14 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
264
7
0
26 Apr 2025
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Lorenz Linhardt
Tom Neuhäuser
Lenka Tětková
Oliver Eberle
ALM
AI4TS
70
1
0
10 Apr 2025
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
Jiancheng Zhao
Xingda Yu
Zhen Yang
MoE
98
3
0
27 Mar 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
181
2
0
22 Jan 2025
TabVer: Tabular Fact Verification with Natural Logic
Rami Aly
Andreas Vlachos
LMTD
126
0
0
02 Nov 2024
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning
Yiming Shi
Jiwei Wei
Yujia Wu
Ran Ran
Chengwei Sun
Shiyuan He
Yang Yang
ALM
104
1
0
17 Oct 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
VGen
188
3
0
08 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
111
16
0
04 Oct 2024
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
Junjie Chen
Xiangheng He
Yusuke Miyao
Danushka Bollegala
119
0
0
03 Oct 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
67
2
0
26 Sep 2024
Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering
Derian Boer
Fabian Koch
Stefan Kramer
KELM
113
4
0
01 Sep 2024
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Marco AF Pimentel
Clément Christophe
Tathagata Raha
Prateek Munjal
Praveen K Kanithi
Shadab Khan
ELM
82
3
0
29 Jul 2024
Look Within, Why LLMs Hallucinate: A Causal Perspective
He Li
Haoang Chi
Mingyu Liu
Wenjing Yang
LRM
75
6
0
14 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
258
18
0
01 Jul 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
208
44
0
09 Jun 2024
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
Guangyi Liu
Rui Ge
Xinyu Zhu
Jingyi Chai
Yaxin Du
Yang Liu
Yanfeng Wang
Siheng Chen
FedML
113
19
0
07 Jun 2024
Binary Hypothesis Testing for Softmax Models and Leverage Score Models
Yeqi Gao
Yuzhou Gu
Zhao Song
81
0
0
09 May 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
133
13
0
10 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
161
171
0
01 Apr 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
52
3
0
22 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
104
0
0
28 Feb 2024
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
Rui Ye
Wenhao Wang
Jingyi Chai
Dihan Li
Zexi Li
Yinda Xu
Yaxin Du
Yanfeng Wang
Siheng Chen
ALM
FedML
AIFin
106
98
0
10 Feb 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
235
409
0
04 Jan 2024
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
O. Ovadia
Menachem Brief
Moshik Mishaeli
Oren Elisha
RALM
116
153
0
10 Dec 2023
Instruction-tuning Aligns LLMs to the Human Brain
Khai Loong Aw
Syrielle Montariol
Badr AlKhamissi
Martin Schrimpf
Antoine Bosselut
154
22
0
01 Dec 2023
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
178
44
0
21 Nov 2023
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
105
14
0
01 Nov 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
120
65
0
23 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
115
240
0
12 Oct 2023
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
87
15
0
08 Oct 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
68
24
0
28 Sep 2023
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Junjie Yin
Jiahao Dong
Yingheng Wang
Christopher De Sa
Volodymyr Kuleshov
MQ
68
6
0
28 Sep 2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Tianle Li
Siyuan Zhuang
...
Zi Lin
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
133
221
0
21 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
155
59
0
17 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
114
29
0
14 Sep 2023
The Poison of Alignment
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
44
9
0
25 Aug 2023
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
118
28
0
22 Aug 2023
Dataset Quantization
Daquan Zhou
Kaixin Wang
Jianyang Gu
Xiang Peng
Dongze Lian
Yifan Zhang
Yang You
Jiashi Feng
DD
94
41
0
21 Aug 2023
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation
Zhiqiang Yuan
Junwei Liu
Qiancheng Zi
Wentai Deng
Xin Peng
Xin Peng
ALM
ELM
LRM
98
83
0
02 Aug 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
146
108
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
114
27
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
123
22
0
19 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Dinesh Manocha
Heng-Chiao Huang
Hongxia Jin
ALM
139
0
0
17 Jul 2023
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACV
SILM
113
43
0
13 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
241
1,773
0
06 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
63
19
0
05 Jul 2023
Latency Adjustable Transformer Encoder for Language Understanding
Sajjad Kachuee
M. Sharifkhani
113
0
0
10 Jan 2022
1