ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05087
  4. Cited By
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
v1v2 (latest)

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

8 June 2023
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
Hao Chen
Chaoya Jiang
Rui Xie
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
    ALMELM
ArXiv (abs)PDFHTMLGithub (914★)

Papers citing "PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization"

50 / 184 papers shown
Title
SED: Self-Evaluation Decoding Enhances Large Language Models for Better
  Generation
SED: Self-Evaluation Decoding Enhances Large Language Models for Better Generation
Ziqin Luo
Haixia Han
Haokun Zhao
Guochao Jiang
Chengyu Du
Tingyun Li
Jiaqing Liang
Deqing Yang
Yanghua Xiao
82
4
0
26 May 2024
Lessons from the Trenches on Reproducible Evaluation of Language Models
Lessons from the Trenches on Reproducible Evaluation of Language Models
Stella Biderman
Hailey Schoelkopf
Lintang Sutawika
Leo Gao
J. Tow
...
Xiangru Tang
Kevin A. Wang
Genta Indra Winata
Franccois Yvon
Andy Zou
ELMALM
198
63
3
23 May 2024
Distilling Instruction-following Abilities of Large Language Models with
  Task-aware Curriculum Planning
Distilling Instruction-following Abilities of Large Language Models with Task-aware Curriculum Planning
Yuanhao Yue
Chengyu Wang
Jun Huang
Peng Wang
ALM
54
9
0
22 May 2024
Fennec: Fine-grained Language Model Evaluation and Correction Extended
  through Branching and Bridging
Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging
Xiaobo Liang
Haoke Zhang
Helan hu
Juntao Li
Jun Xu
Min Zhang
ALM
77
3
0
20 May 2024
Realistic Evaluation of Toxicity in Large Language Models
Realistic Evaluation of Toxicity in Large Language Models
Tinh Son Luong
Thanh-Thien Le
Linh Ngo Van
Thien Huu Nguyen
LM&MA
69
6
0
17 May 2024
Language Models can Evaluate Themselves via Probability Discrepancy
Language Models can Evaluate Themselves via Probability Discrepancy
Tingyu Xia
Bowen Yu
Yuan Wu
Yi-Ju Chang
Chang Zhou
ELM
112
5
0
17 May 2024
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large
  Language Models
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models
Raghuveer Peri
Sai Muralidhar Jayanthi
S. Ronanki
Anshu Bhatia
Karel Mundnich
...
Srikanth Vishnubhotla
Daniel Garcia-Romero
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
AAML
80
3
0
14 May 2024
Open Source Language Models Can Provide Feedback: Evaluating LLMs'
  Ability to Help Students Using GPT-4-As-A-Judge
Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge
Charles Koutcheme
Nicola Dainese
Sami Sarsa
Arto Hellas
Juho Leinonen
Paul Denny
ELMALM
77
24
0
08 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating
  Other Language Models
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMeALMELM
147
205
0
02 May 2024
Advances and Open Challenges in Federated Learning with Foundation
  Models
Advances and Open Challenges in Federated Learning with Foundation Models
Chao Ren
Han Yu
Hongyi Peng
Xiaoli Tang
Anran Li
...
A. Tan
Bo Zhao
Xiaoxiao Li
Zengxiang Li
Qiang Yang
FedMLAIFinAI4CE
152
11
0
23 Apr 2024
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream
  Tasks with Collective Wisdom
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom
Yuanqin He
Yan Kang
Lixin Fan
Qiang Yang
62
3
0
18 Apr 2024
TEL'M: Test and Evaluation of Language Models
TEL'M: Test and Evaluation of Language Models
G. Cybenko
Joshua Ackerman
Paul Lintilhac
ALMELM
81
0
0
16 Apr 2024
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to
  Large Language Model
Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model
Hyunsoo Cho
ALM
31
0
0
15 Apr 2024
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
Kehua Feng
Keyan Ding
Hongzhi Tan
Kede Ma
Zhihua Wang
...
Yuzhou Cheng
Ge Sun
Guozhou Zheng
Qiang Zhang
H. Chen
128
13
0
10 Apr 2024
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation
  of Large Language Models
FreeEval: A Modular Framework for Trustworthy and Efficient Evaluation of Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Zhengran Zeng
Wei Ye
Jindong Wang
Yue Zhang
Shikun Zhang
61
3
0
09 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
86
25
0
04 Apr 2024
Prior Constraints-based Reward Model Training for Aligning Large
  Language Models
Prior Constraints-based Reward Model Training for Aligning Large Language Models
Hang Zhou
Chenglong Wang
Yimin Hu
Tong Xiao
Chunliang Zhang
Jingbo Zhu
ALM
89
2
0
01 Apr 2024
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Optimization-based Prompt Injection Attack to LLM-as-a-Judge
Jiawen Shi
Zenghui Yuan
Yinuo Liu
Yue Huang
Pan Zhou
Lichao Sun
Neil Zhenqiang Gong
AAML
146
57
0
26 Mar 2024
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
NovelQA: Benchmarking Question Answering on Documents Exceeding 200K Tokens
Cunxiang Wang
Ruoxi Ning
Boqi Pan
Tonghui Wu
Qipeng Guo
...
Guangsheng Bao
Xiangkun Hu
Zheng Zhang
Qian Wang
Yue Zhang
RALM
237
11
0
18 Mar 2024
Self-Evaluation of Large Language Model based on Glass-box Features
Self-Evaluation of Large Language Model based on Glass-box Features
Hui Huang
Yingqi Qu
Jing Liu
Muyun Yang
Tiejun Zhao
39
2
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
122
17
0
07 Mar 2024
Enhancing Instructional Quality: Leveraging Computer-Assisted Textual
  Analysis to Generate In-Depth Insights from Educational Artifacts
Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts
Zewei Tian
Min Sun
Alex Liu
Shawon Sarkar
Jing Liu
71
5
0
06 Mar 2024
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling
Shanghaoran Quan
MoEOffRL
80
10
0
02 Mar 2024
LLMCRIT: Teaching Large Language Models to Use Criteria
LLMCRIT: Teaching Large Language Models to Use Criteria
Weizhe Yuan
Pengfei Liu
Matthias Gallé
ALM
43
9
0
02 Mar 2024
Clustering and Ranking: Diversity-preserved Instruction Selection
  through Expert-aligned Quality Estimation
Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation
Yuan Ge
Yilun Liu
Chi Hu
Weibin Meng
Shimin Tao
Xiaofeng Zhao
Hongxia Ma
Li Zhang
Hao Yang
Tong Xiao
ALM
77
35
0
28 Feb 2024
Prediction-Powered Ranking of Large Language Models
Prediction-Powered Ranking of Large Language Models
Ivi Chatzi
Eleni Straitouri
Suhas Thejaswi
Manuel Gomez Rodriguez
ALM
127
9
0
27 Feb 2024
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large
  Language Models
KIEval: A Knowledge-grounded Interactive Evaluation Framework for Large Language Models
Zhuohao Yu
Chang Gao
Wenjin Yao
Yidong Wang
Wei Ye
Jindong Wang
Xing Xie
Yue Zhang
Shikun Zhang
90
28
0
23 Feb 2024
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language
  Models in Multi-Turn Dialogues
MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues
Ge Bai
Jie Liu
Xingyuan Bu
Yancheng He
Jiaheng Liu
...
Zhuoran Lin
Wenbo Su
Tiezheng Ge
Bo Zheng
Wanli Ouyang
ELMLM&MA
125
94
0
22 Feb 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
108
42
0
21 Feb 2024
Ranking Large Language Models without Ground Truth
Ranking Large Language Models without Ground Truth
Amit Dhurandhar
Rahul Nair
Moninder Singh
Elizabeth M. Daly
Karthikeyan N. Ramamurthy
HILMALMELM
110
7
0
21 Feb 2024
TreeEval: Benchmark-Free Evaluation of Large Language Models through
  Tree Planning
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
Xiang Li
Yunshi Lan
Chao Yang
ELM
63
11
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Dinesh Manocha
KELMVLM
173
135
0
20 Feb 2024
T-RAG: Lessons from the LLM Trenches
T-RAG: Lessons from the LLM Trenches
M. Fatehkia
J. Lucas
Sanjay Chawla
LLMAG
87
22
0
12 Feb 2024
Natural Language Reinforcement Learning
Natural Language Reinforcement Learning
Xidong Feng
Bo Liu
Mengyue Yang
Ziyan Wang
Girish A. Koushiks
Yali Du
Ying Wen
Jun Wang
OffRL
102
5
0
11 Feb 2024
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not
  Evaluate
The Generative AI Paradox on Evaluation: What It Can Solve, It May Not Evaluate
Juhyun Oh
Eunsu Kim
Inha Cha
Alice Oh
ELM
94
9
0
09 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELMLM&MA
224
41
0
02 Feb 2024
Weaver: Foundation Models for Creative Writing
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
99
20
0
30 Jan 2024
PRE: A Peer Review Based Large Language Model Evaluator
PRE: A Peer Review Based Large Language Model Evaluator
Zhumin Chu
Qingyao Ai
Yiteng Tu
Haitao Li
Yiqun Liu
LRMALM
112
21
0
28 Jan 2024
An Empirical Study on Large Language Models in Accuracy and Robustness
  under Chinese Industrial Scenarios
An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios
Zongjie Li
Wenying Qiu
Pingchuan Ma
Yichen Li
You Li
Sijia He
Baozheng Jiang
Shuai Wang
Weixi Gu
109
2
0
27 Jan 2024
Instruction Fine-Tuning: Does Prompt Loss Matter?
Instruction Fine-Tuning: Does Prompt Loss Matter?
Mathew Huerta-Enochian
Seung Yong Ko
71
7
0
24 Jan 2024
Leveraging Large Language Models for NLG Evaluation: Advances and
  Challenges
Leveraging Large Language Models for NLG Evaluation: Advances and Challenges
Zhen Li
Xiaohan Xu
Tao Shen
Can Xu
Jia-Chen Gu
Yuxuan Lai
Chongyang Tao
Shuai Ma
LM&MAELM
134
15
0
13 Jan 2024
The Critique of Critique
The Critique of Critique
Shichao Sun
Junlong Li
Weizhe Yuan
Ruifeng Yuan
Wenjie Li
Pengfei Liu
ELM
75
0
0
09 Jan 2024
InFoBench: Evaluating Instruction Following Ability in Large Language
  Models
InFoBench: Evaluating Instruction Following Ability in Large Language Models
Yiwei Qin
Kaiqiang Song
Yebowen Hu
Wenlin Yao
Sangwoo Cho
Xiaoyang Wang
Xuansheng Wu
Fei Liu
Pengfei Liu
Dong Yu
ELM
104
52
0
07 Jan 2024
Supervised Knowledge Makes Large Language Models Better In-context
  Learners
Supervised Knowledge Makes Large Language Models Better In-context Learners
Linyi Yang
Shuibai Zhang
Zhuohao Yu
Guangsheng Bao
Yidong Wang
...
Ruochen Xu
Weirong Ye
Xing Xie
Weizhu Chen
Yue Zhang
152
19
0
26 Dec 2023
"What's important here?": Opportunities and Challenges of Using LLMs in
  Retrieving Information from Web Interfaces
"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces
Faria Huq
Jeffrey P. Bigham
Nikolas Martelaro
56
7
0
11 Dec 2023
Inherent limitations of LLMs regarding spatial information
Inherent limitations of LLMs regarding spatial information
He Yan
Xinyao Hu
Xiangpeng Wan
Chengyu Huang
Kai Zou
Shiqi Xu
LRM
69
5
0
05 Dec 2023
Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability,
  Explainability, and Safety
Building Trustworthy NeuroSymbolic AI Systems: Consistency, Reliability, Explainability, and Safety
Manas Gaur
Amit P. Sheth
67
17
0
05 Dec 2023
New Evaluation Metrics Capture Quality Degradation due to LLM
  Watermarking
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking
Karanpartap Singh
James Zou
WaLM
166
9
0
04 Dec 2023
AlignBench: Benchmarking Chinese Alignment of Large Language Models
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELMLM&MAALM
187
50
0
30 Nov 2023
CritiqueLLM: Towards an Informative Critique Generation Model for
  Evaluation of Large Language Model Generation
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Pei Ke
Bosi Wen
Andrew Feng
Xiao-Yang Liu
Xuanyu Lei
...
Aohan Zeng
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
ELMALM
134
35
0
30 Nov 2023
Previous
1234
Next