Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04757
Cited By
INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models
7 June 2023
Yew Ken Chia
Pengfei Hong
Lidong Bing
Soujanya Poria
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models"
50 / 54 papers shown
Title
Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases
Derian Boer
Stephen Roth
Stefan Kramer
KELM
32
0
0
14 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
96
2
0
26 Apr 2025
Cat, Rat, Meow: On the Alignment of Language Model and Human Term-Similarity Judgments
Lorenz Linhardt
Tom Neuhäuser
Lenka Tětková
Oliver Eberle
ALM
AI4TS
42
0
0
10 Apr 2025
MSPLoRA: A Multi-Scale Pyramid Low-Rank Adaptation for Efficient Model Fine-Tuning
Jiancheng Zhao
Xingda Yu
Zhen Yang
MoE
62
1
0
27 Mar 2025
FlanEC: Exploring Flan-T5 for Post-ASR Error Correction
Moreno La Quatra
Valerio Mario Salerno
Yu Tsao
Sabato Marco Siniscalchi
99
0
0
22 Jan 2025
TabVer: Tabular Fact Verification with Natural Logic
Rami Aly
Andreas Vlachos
LMTD
33
0
0
02 Nov 2024
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning
Yiming Shi
Jiwei Wei
Yujia Wu
Ran Ran
Chengwei Sun
Shiyuan He
Yang Yang
ALM
43
1
0
17 Oct 2024
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan
Elias Stengel-Eskin
Jaemin Cho
Joey Tianyi Zhou
VGen
46
1
0
08 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
37
10
0
04 Oct 2024
Improving Unsupervised Constituency Parsing via Maximizing Semantic Information
Junjie Chen
Xiangheng He
Yusuke Miyao
Danushka Bollegala
43
0
0
03 Oct 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
47
0
0
26 Sep 2024
Harnessing the Power of Semi-Structured Knowledge and LLMs with Triplet-Based Prefiltering for Question Answering
Derian Boer
Fabian Koch
Stefan Kramer
KELM
44
4
0
01 Sep 2024
Beyond Metrics: A Critical Analysis of the Variability in Large Language Model Evaluation Frameworks
Marco AF Pimentel
Clément Christophe
Tathagata Raha
Prateek Munjal
Praveen K Kanithi
Shadab Khan
ELM
42
2
0
29 Jul 2024
Look Within, Why LLMs Hallucinate: A Causal Perspective
He Li
Haoang Chi
Mingyu Liu
Wenjing Yang
LRM
37
5
0
14 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
108
13
0
01 Jul 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELM
ALM
LM&MA
105
33
0
09 Jun 2024
FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models
Guangyi Liu
Rui Ge
Xinyu Zhu
Jingyi Chai
Yaxin Du
Yang Liu
Yanfeng Wang
Siheng Chen
FedML
41
14
0
07 Jun 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Kede Ma
Zhihua Wang
Qiang Zhang
Huajun Chen
42
10
0
10 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
53
131
0
01 Apr 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
37
3
0
22 Mar 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
48
0
0
28 Feb 2024
OpenFedLLM: Training Large Language Models on Decentralized Private Data via Federated Learning
Rui Ye
Wenhao Wang
Jingyi Chai
Dihan Li
Zexi Li
Yinda Xu
Yaxin Du
Yanfeng Wang
Siheng Chen
ALM
FedML
AIFin
11
78
0
10 Feb 2024
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALM
LRM
54
359
0
04 Jan 2024
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
O. Ovadia
Menachem Brief
Moshik Mishaeli
Oren Elisha
RALM
40
134
0
10 Dec 2023
Instruction-tuning Aligns LLMs to the Human Brain
Khai Loong Aw
Syrielle Montariol
Badr AlKhamissi
Martin Schrimpf
Antoine Bosselut
36
18
0
01 Dec 2023
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin
Tuo Zhao
81
36
0
21 Nov 2023
Instructive Decoding: Instruction-Tuned Large Language Models are Self-Refiner from Noisy Instructions
Taehyeon Kim
Joonkee Kim
Gihun Lee
Se-Young Yun
33
11
0
01 Nov 2023
AlpaCare:Instruction-tuned Large Language Models for Medical Application
Xinlu Zhang
Chenxin Tian
Xianjun Yang
Lichang Chen
Zekun Li
Linda R. Petzold
LM&MA
32
59
0
23 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
37
213
0
12 Oct 2023
Compresso: Structured Pruning with Collaborative Prompting Learns Compact Large Language Models
Song Guo
Jiahang Xu
Li Zhang
Mao Yang
25
14
0
08 Oct 2023
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond
Timothée Darcet
Yuyu Zhang
Yijie Zhu
Chenguang Xi
Pengyang Gao
Piotr Bojanowski
Kevin Chen-Chuan Chang
ELM
35
24
0
28 Sep 2023
ModuLoRA: Finetuning 2-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers
Junjie Yin
Jiahao Dong
Yingheng Wang
Christopher De Sa
Volodymyr Kuleshov
MQ
31
4
0
28 Sep 2023
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Tianle Li
Siyuan Zhuang
...
Zi Lin
Eric P. Xing
Joseph E. Gonzalez
Ion Stoica
Haotong Zhang
29
180
0
21 Sep 2023
Can Large Language Models Understand Real-World Complex Instructions?
Qi He
Jie Zeng
Wenhao Huang
Lina Chen
Jin Xiao
...
Shisong Chen
Yikai Zhang
Zhouhong Gu
Jiaqing Liang
Yanghua Xiao
ALM
LRM
ELM
98
52
0
17 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
24
25
0
14 Sep 2023
The Poison of Alignment
Aibek Bekbayev
Sungbae Chun
Yerzat Dulat
James Yamazaki
28
9
0
25 Aug 2023
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
24
24
0
22 Aug 2023
Dataset Quantization
Daquan Zhou
Kaixin Wang
Jianyang Gu
Xiang Peng
Dongze Lian
Yifan Zhang
Yang You
Jiashi Feng
DD
37
38
0
21 Aug 2023
Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation
Zhiqiang Yuan
Junwei Liu
Qiancheng Zi
Mingwei Liu
Xin Peng
Yiling Lou
ALM
ELM
LRM
22
73
0
02 Aug 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
46
99
0
20 Jul 2023
Instruction-following Evaluation through Verbalizer Manipulation
Shiyang Li
Jun Yan
Hai Wang
Zheng Tang
Xiang Ren
Vijay Srinivasan
Hongxia Jin
36
25
0
20 Jul 2023
DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI
Jianguo Zhang
Kun Qian
Zhiwei Liu
Shelby Heinecke
Rui Meng
Ye Liu
Zhou Yu
Huan Wang
Silvio Savarese
Caiming Xiong
39
22
0
19 Jul 2023
AlpaGasus: Training A Better Alpaca with Fewer Data
Lichang Chen
Shiyang Li
Jun Yan
Hai Wang
Kalpa Gunaratna
...
Zheng Tang
Vijay Srinivasan
Dinesh Manocha
Heng-Chiao Huang
Hongxia Jin
ALM
54
0
0
17 Jul 2023
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACV
SILM
38
36
0
13 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
75
1,517
0
06 Jul 2023
Flacuna: Unleashing the Problem Solving Power of Vicuna using FLAN Fine-Tuning
Deepanway Ghosal
Yew Ken Chia
Navonil Majumder
Soujanya Poria
ALM
LRM
38
17
0
05 Jul 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
574
0
03 May 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
369
12,003
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
416
8,559
0
28 Jan 2022
1
2
Next