ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.13308
  4. Cited By
Calibrating LLM-Based Evaluator

Calibrating LLM-Based Evaluator

23 September 2023
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
ArXivPDFHTML

Papers citing "Calibrating LLM-Based Evaluator"

10 / 10 papers shown
Title
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
Peichao Lai
Kaipeng Zhang
Yi Lin
L. Zhang
Feiyang Ye
...
Yanwei Xu
Conghui He
Yixuan Wang
Wentao Zhang
Bin Cui
ELM
LRM
44
0
0
12 May 2025
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Toward Generalizable Evaluation in the LLM Era: A Survey Beyond Benchmarks
Yixin Cao
Shibo Hong
Xuzhao Li
Jiahao Ying
Yubo Ma
...
Juanzi Li
Aixin Sun
Xuanjing Huang
Tat-Seng Chua
Tianwei Zhang
ALM
ELM
86
1
0
26 Apr 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
132
69
0
20 Jan 2025
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
46
3
0
14 Oct 2024
Automated radiotherapy treatment planning guided by GPT-4Vision
Automated radiotherapy treatment planning guided by GPT-4Vision
Sheng Liu
O. Pastor-Serrano
Yizheng Chen
Matthew Gopaulchan
Weixing Liang
...
Michael Gensheimer
P. Dong
Yong Yang
James Zou
Lei Xing
39
6
0
21 Jun 2024
Inverse Constitutional AI: Compressing Preferences into Principles
Inverse Constitutional AI: Compressing Preferences into Principles
Arduin Findeis
Timo Kaufmann
Eyke Hüllermeier
Samuel Albanie
Robert Mullins
SyDa
46
9
0
02 Jun 2024
Multicalibration for Confidence Scoring in LLMs
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso
Martín Bertrán
Riccardo Fogliato
Aaron Roth
29
12
0
06 Apr 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
60
29
0
02 Feb 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
367
8,495
0
28 Jan 2022
1