ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05087
  4. Cited By
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning
  Optimization
v1v2 (latest)

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

8 June 2023
Yidong Wang
Zhuohao Yu
Zhengran Zeng
Linyi Yang
Cunxiang Wang
Hao Chen
Chaoya Jiang
Rui Xie
Jindong Wang
Xingxu Xie
Wei Ye
Shi-Bo Zhang
Yue Zhang
    ALMELM
ArXiv (abs)PDFHTMLGithub (914★)

Papers citing "PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization"

34 / 184 papers shown
Title
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models
  via Unconstrained Generation
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Xun Liang
Shichao Song
Pengnian Qi
Zhiyu Li
Feiyu Xiong
...
Zhaohui Wy
Dawei He
Peng Cheng
Zhonghao Wang
Haiying Deng
HILM
84
22
0
26 Nov 2023
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM
  Instruction Tuning
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning
Yilun Liu
Shimin Tao
Xiaofeng Zhao
Ming Zhu
Wenbing Ma
...
Min Zhang
Hongxia Ma
Li Zhang
Hao Yang
Yanfei Jiang
88
13
0
22 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language
  Models for Instruction Controllable Summarization
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Shafiq Joty
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
109
64
0
15 Nov 2023
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for
  Human-Aligned LLMs
TencentLLMEval: A Hierarchical Evaluation of Real-World Capabilities for Human-Aligned LLMs
Shuyi Xie
Wenlin Yao
Yong Dai
Shaobo Wang
Donlin Zhou
...
Zhichao Hu
Dong Yu
Zhengyou Zhang
Jing Nie
Yuhong Liu
ELMALM
98
4
0
09 Nov 2023
Black-Box Prompt Optimization: Aligning Large Language Models without
  Model Training
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training
Jiale Cheng
Xiao Liu
Kehan Zheng
Pei Ke
Hongning Wang
Yuxiao Dong
Jie Tang
Minlie Huang
79
88
0
07 Nov 2023
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
JudgeLM: Fine-tuned Large Language Models are Scalable Judges
Lianghui Zhu
Xinggang Wang
Xinlong Wang
ELMALM
184
143
0
26 Oct 2023
Evaluating, Understanding, and Improving Constrained Text Generation for
  Large Language Models
Evaluating, Understanding, and Improving Constrained Text Generation for Large Language Models
Xiang Chen
Xiaojun Wan
55
0
0
25 Oct 2023
Unleashing the potential of prompt engineering in Large Language Models:
  a comprehensive review
Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
Banghao Chen
Zhaofeng Zhang
Nicolas Langrené
Shengxin Zhu
LLMAG
116
89
0
23 Oct 2023
Automated Evaluation of Personalized Text Generation using Large
  Language Models
Automated Evaluation of Personalized Text Generation using Large Language Models
Yaqing Wang
Jiepu Jiang
Mingyang Zhang
Cheng-rong Li
Yi Liang
Qiaozhu Mei
Michael Bendersky
44
6
0
17 Oct 2023
Fine-tuning ChatGPT for Automatic Scoring
Fine-tuning ChatGPT for Automatic Scoring
Ehsan Latif
Xiaoming Zhai
AI4MH
116
108
0
16 Oct 2023
Instruction Tuning with Human Curriculum
Instruction Tuning with Human Curriculum
Bruce W. Lee
Hyunsoo Cho
Kang Min Yoo
89
4
0
14 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language
  Models
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALMLM&MAELM
113
240
0
12 Oct 2023
Language Models are Universal Embedders
Language Models are Universal Embedders
Xin Zhang
Zehan Li
Yanzhao Zhang
Dingkun Long
Pengjun Xie
Meishan Zhang
Min Zhang
KELMELM
290
9
0
12 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELMALM
148
192
0
11 Oct 2023
Parrot: Enhancing Multi-Turn Instruction Following for Large Language
  Models
Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
Yuchong Sun
Che Liu
Kun Zhou
Jinwen Huang
Ruihua Song
Xin Zhao
Fuzheng Zhang
Di Zhang
Kun Gai
LRM
76
11
0
11 Oct 2023
Generative Judge for Evaluating Alignment
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELMALM
112
91
0
09 Oct 2023
EMO: Earth Mover Distance Optimization for Auto-Regressive Language
  Modeling
EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling
Siyu Ren
Zhiyong Wu
Kenny Q. Zhu
72
4
0
07 Oct 2023
TIGERScore: Towards Building Explainable Metric for All Text Generation
  Tasks
TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks
Dongfu Jiang
Yishan Li
Ge Zhang
Wenhao Huang
Bill Yuchen Lin
Wenhu Chen
ALM
111
69
0
01 Oct 2023
HyPoradise: An Open Baseline for Generative Speech Recognition with
  Large Language Models
HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models
Cheng Chen
Yuchen Hu
Chao-Han Huck Yang
Sabato Marco Siniscalchi
Pin-Yu Chen
Eng Siong Chng
99
48
0
27 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MAELMAI4MH
139
78
0
21 Sep 2023
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large
  Language Models for Dynamic Inference
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference
Parsa Kavehzadeh
Mojtaba Valipour
Marzieh S. Tahaei
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
87
6
0
16 Sep 2023
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li
Rui Li
Qi Liu
98
16
0
08 Sep 2023
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large
  Language Models
ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baolin Zhang
Hai-Yong Xie
Pengfan Du
Junhao Chen
Pengfei Cao
Yubo Chen
Shengping Liu
Kang Liu
Jun Zhao
ELMALM
53
2
0
28 Aug 2023
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for
  Diagnostic Conversation
LLM-Mini-CEX: Automatic Evaluation of Large Language Model for Diagnostic Conversation
Xiaoming Shi
Jinfeng Xu
Jinru Ding
Jiali Pang
Sichen Liu
...
Lu Lu
Haihong Yang
Mingtao Hu
Tong Ruan
Shaoting Zhang
LM&MAELM
58
13
0
15 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELMLLMAGALM
99
504
0
14 Aug 2023
A Preliminary Study of the Intrinsic Relationship between Complexity and
  Alignment
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment
Ying Zhao
Yu Bowen
Binyuan Hui
Haiyang Yu
Fei Huang
Yongbin Li
N. Zhang
125
25
0
10 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
CValues: Measuring the Values of Chinese Large Language Models from
  Safety to Responsibility
CValues: Measuring the Values of Chinese Large Language Models from Safety to Responsibility
Guohai Xu
Jiayi Liu
Mingshi Yan
Haotian Xu
Jinghui Si
...
Rong Zhang
Ji Zhang
Chao Peng
Feiyan Huang
Jingren Zhou
ALMELM
98
83
0
19 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,769
0
06 Jul 2023
Style Over Substance: Evaluation Biases for Large Language Models
Style Over Substance: Evaluation Biases for Large Language Models
Minghao Wu
Alham Fikri Aji
ALMELM
147
47
0
06 Jul 2023
Interactive Molecular Discovery with Natural Language
Interactive Molecular Discovery with Natural Language
Zheni Zeng
Bangchen Yin
Shipeng Wang
Jia-Rou Liu
Cheng Yang
Haishen Yao
Xingzhi Sun
Maosong Sun
Guotong Xie
Zhiyuan Liu
AI4CE
77
15
0
21 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
613
4,459
0
09 Jun 2023
Large Language Models are not Fair Evaluators
Large Language Models are not Fair Evaluators
Peiyi Wang
Lei Li
Liang Chen
Zefan Cai
Dawei Zhu
Binghuai Lin
Yunbo Cao
Qi Liu
Tianyu Liu
Zhifang Sui
ALM
164
575
0
29 May 2023
Automatic Model Selection with Large Language Models for Reasoning
Automatic Model Selection with Large Language Models for Reasoning
Xu Zhao
Yuxi Xie
Kenji Kawaguchi
Junxian He
Qizhe Xie
ReLMLRM
84
40
0
23 May 2023
Previous
1234