ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08808
  4. Cited By
Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

16 August 2024
Ravi Raju
Swayambhoo Jain
Bo Li
Jonathan Li
Urmish Thakker
    ALM
    ELM
ArXivPDFHTML

Papers citing "Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge"

13 / 13 papers shown
Title
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang
Munan Ning
Zheyuan Liu
Yanbo Wang
Jiayi Ye
Yue Huang
Shuo Yang
Xiao Chen
Y. Song
Li Yuan
LRM
63
0
0
19 Mar 2025
Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Can LLM Assist in the Evaluation of the Quality of Machine Learning Explanations?
Bo Wang
Yiqiao Li
Jianlong Zhou
Fang Chen
XAI
ELM
42
0
0
28 Feb 2025
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
44
1
0
27 Feb 2025
AI Alignment at Your Discretion
AI Alignment at Your Discretion
Maarten Buyl
Hadi Khalaf
C. M. Verdun
Lucas Monteiro Paes
Caio Vieira Machado
Flavio du Pin Calmon
45
0
0
10 Feb 2025
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Stefan Pasch
40
0
0
04 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
126
70
0
25 Nov 2024
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Limits to scalable evaluation at the frontier: LLM as Judge won't beat twice the data
Florian E. Dorner
Vivian Y. Nastl
Moritz Hardt
ELM
ALM
50
6
0
17 Oct 2024
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
MCQG-SRefine: Multiple Choice Question Generation and Evaluation with Iterative Self-Critique, Correction, and Comparison Feedback
Zonghai Yao
Aditya Parashar
Huixue Zhou
Won Seok Jang
Feiyun Ouyang
Zhichao Yang
Hong-ye Yu
ELM
53
2
0
17 Oct 2024
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
TestAgent: A Framework for Domain-Adaptive Evaluation of LLMs via Dynamic Benchmark Construction and Exploratory Interaction
Wanying Wang
Zeyu Ma
Pengfei Liu
Mingang Chen
LLMAG
50
1
0
15 Oct 2024
FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in
  LMs
FactCheckmate: Preemptively Detecting and Mitigating Hallucinations in LMs
Deema Alnuhait
Neeraja Kirtane
Muhammad Khalifa
Hao Peng
HILM
LRM
42
2
0
03 Oct 2024
Aligning Human and LLM Judgments: Insights from EvalAssist on
  Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences
Aligning Human and LLM Judgments: Insights from EvalAssist on Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences
Zahra Ashktorab
Michael Desmond
Qian Pan
James M. Johnson
Martin Santillan Cooper
Elizabeth M. Daly
Rahul Nair
Tejaswini Pedapati
Swapnaja Achintalwar
Werner Geyer
ELM
54
5
0
01 Oct 2024
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information
  Retrieval Models
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models
Nandan Thakur
Nils Reimers
Andreas Rucklé
Abhishek Srivastava
Iryna Gurevych
VLM
261
975
0
17 Apr 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
243
815
0
13 Sep 2019
1