ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.00936
  4. Cited By
A Survey of Useful LLM Evaluation

A Survey of Useful LLM Evaluation

3 June 2024
Ji-Lun Peng
Sijia Cheng
Egil Diau
Yung-Yu Shih
Po-Heng Chen
Yen-Ting Lin
Yun-Nung Chen
    LLMAG
    ELM
ArXivPDFHTML

Papers citing "A Survey of Useful LLM Evaluation"

23 / 23 papers shown
Title
Why Do Multi-Agent LLM Systems Fail?
Why Do Multi-Agent LLM Systems Fail?
Mert Cemri
Melissa Z. Pan
Shuyi Yang
Lakshya A Agrawal
Bhavya Chopra
...
Dan Klein
Kannan Ramchandran
Matei A. Zaharia
Joseph E. Gonzalez
Ion Stoica
LLMAG
Presented at ResearchTrend Connect | LLMAG on 23 Apr 2025
129
8
0
17 Mar 2025
Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Bo Wang
Yiqiao Li
Jianlong Zhou
Fang Chen
XAI
ELM
42
0
0
28 Feb 2025
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Khanh-Tung Tran
Dung Dao
Minh-Duong Nguyen
Quoc-Viet Pham
Barry O’Sullivan
Hoang D. Nguyen
LLMAG
95
27
0
10 Jan 2025
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General
  Reasoning in LLMs
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
Mohammad Aflah Khan
Neemesh Yadav
Sarah Masud
Md. Shad Akhtar
74
0
0
16 Dec 2024
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Open Ko-LLM Leaderboard2: Bridging Foundational and Practical Evaluation for Korean LLMs
Hyeonwoo Kim
Dahyun Kim
Jihoo Kim
Sukyung Lee
Y. Kim
Chanjun Park
44
0
0
16 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
47
1
0
15 Oct 2024
Representing the Under-Represented: Cultural and Core Capability
  Benchmarks for Developing Thai Large Language Models
Representing the Under-Represented: Cultural and Core Capability Benchmarks for Developing Thai Large Language Models
Dahyun Kim
Sukyung Lee
Yungi Kim
Attapol Rutherford
Chanjun Park
ELM
31
1
0
07 Oct 2024
A Survey on Complex Tasks for Goal-Directed Interactive Agents
A Survey on Complex Tasks for Goal-Directed Interactive Agents
Mareike Hartmann
Alexander Koller
LM&Ro
LLMAG
34
0
0
27 Sep 2024
What Makes a Good Story and How Can We Measure It? A Comprehensive
  Survey of Story Evaluation
What Makes a Good Story and How Can We Measure It? A Comprehensive Survey of Story Evaluation
Dingyi Yang
Qin Jin
44
5
0
26 Aug 2024
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
On Protecting the Data Privacy of Large Language Models (LLMs): A Survey
Biwei Yan
Kun Li
Minghui Xu
Yueyan Dong
Yue Zhang
Zhaochun Ren
Xiuzhen Cheng
AILaw
PILM
70
76
0
08 Mar 2024
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated
  Large Language Model Agents
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
Qiusi Zhan
Zhixiang Liang
Zifan Ying
Daniel Kang
LLMAG
46
73
0
05 Mar 2024
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Measuring and Reducing LLM Hallucination without Gold-Standard Answers
Jiaheng Wei
Yuanshun Yao
Jean-François Ton
Hongyi Guo
Andrew Estornell
Yang Liu
HILM
55
18
0
16 Feb 2024
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via
  Self-Evaluation
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang
Baolin Peng
Ye Tian
Jingyan Zhou
Lifeng Jin
Linfeng Song
Haitao Mi
Helen Meng
HILM
42
43
0
14 Feb 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
240
2,494
0
06 Oct 2022
Multiple-Choice Question Generation: Towards an Automated Assessment
  Framework
Multiple-Choice Question Generation: Towards an Automated Assessment Framework
Vatsal Raina
Mark J. F. Gales
AI4Ed
ELM
26
32
0
23 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language,
  Vision, and Action
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
158
436
0
10 Jul 2022
Large Language Models are Few-Shot Clinical Information Extractors
Large Language Models are Few-Shot Clinical Information Extractors
Monica Agrawal
S. Hegselmann
Hunter Lang
Yoon Kim
David Sontag
BDL
LM&MA
162
334
0
25 May 2022
Teaching language models to support answers with verified quotes
Teaching language models to support answers with verified quotes
Jacob Menick
Maja Trebacz
Vladimir Mikulik
John Aslanides
Francis Song
...
Mia Glaese
Susannah Young
Lucy Campbell-Gillingham
G. Irving
Nat McAleese
ELM
RALM
243
257
0
21 Mar 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
217
367
0
15 Oct 2021
ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer
  Assessments
ALL-IN-ONE: Multi-Task Learning BERT models for Evaluating Peer Assessments
Qinjin Jia
Jiali Cui
Yunkai Xiao
Chengyuan Liu
Parvez Rashid
E. Gehringer
32
43
0
08 Oct 2021
Explaining Answers with Entailment Trees
Explaining Answers with Entailment Trees
Bhavana Dalvi
Peter Alexander Jansen
Oyvind Tafjord
Zhengnan Xie
Hannah Smith
Leighanna Pipatanangkura
Peter Clark
ReLM
FAtt
LRM
239
184
0
17 Apr 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit
  Reasoning Strategies
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
250
673
0
06 Jan 2021
Language Models as Knowledge Bases?
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
415
2,586
0
03 Sep 2019
1