ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
v1v2v3 (latest)

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELMALMLM&MA
ArXiv (abs)PDFHTMLGithub (344★)

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 264 papers shown
Title
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
116
0
0
08 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
60
0
0
04 May 2025
LookAlike: Consistent Distractor Generation in Math MCQs
LookAlike: Consistent Distractor Generation in Math MCQs
Nisarg Parikh
Nigel Fernandez
Alexander Scarlatos
Simon Woodhead
Andrew Lan
127
0
0
03 May 2025
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems
Shaokun Zhang
Ming Yin
Jieyu Zhang
Jing Liu
Zhiguang Han
...
Beibin Li
Chi Wang
Hongru Wang
Yuxiao Chen
Qingyun Wu
200
7
0
30 Apr 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
138
0
0
29 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
149
0
0
26 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
Sung Ju Hwang
AI4CE
151
5
0
24 Apr 2025
Process Reward Models That Think
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRLALMLRM
148
9
0
23 Apr 2025
DocAgent: A Multi-Agent System for Automated Code Documentation Generation
DocAgent: A Multi-Agent System for Automated Code Documentation Generation
Dayu Yang
Antoine Simoulin
Xin Qian
Xiaoyi Liu
Yuwei Cao
Zhaopu Teng
Grey Yang
LLMAG
149
0
0
11 Apr 2025
Large Language Models as Span Annotators
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
129
0
0
11 Apr 2025
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
TALE: A Tool-Augmented Framework for Reference-Free Evaluation of Large Language Models
Sher Badshah
Ali Emami
Hassan Sajjad
LLMAGELM
107
0
0
10 Apr 2025
Summarizing Speech: A Comprehensive Survey
Summarizing Speech: A Comprehensive Survey
Fabian Retkowski
Maike Züfle
Andreas Sudmann
Dinah Pfau
Jan Niehues
Alexander Waibel
Alexander H. Waibel
119
0
0
10 Apr 2025
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
HypoEval: Hypothesis-Guided Evaluation for Natural Language Generation
Mingxuan Li
Hanchen Li
Chenhao Tan
ALMELM
134
0
0
09 Apr 2025
ARLO: A Tailorable Approach for Transforming Natural Language Software Requirements into Architecture using LLMs
ARLO: A Tailorable Approach for Transforming Natural Language Software Requirements into Architecture using LLMs
Tooraj Helmi
56
0
0
08 Apr 2025
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi
Alireza Hashemi
Majid Daliri
Pegah Mohammadipour
Alireza Farhadi
Samira Malek
Yekta Yazdanifard
Amir Khasahmadi
V. Honavar
ELMLRM
168
4
0
01 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
152
2
0
01 Apr 2025
JEEM: Vision-Language Understanding in Four Arabic Dialects
JEEM: Vision-Language Understanding in Four Arabic Dialects
Karima Kadaoui
Hanin Atwany
Hamdan Al-Ali
Abdelrahman Mohamed
Ali Mekky
Sergei Tilga
Natalia Fedorova
Ekaterina Artemova
Hanan Aldarmaki
Yova Kementchedjhieva
VLM
96
4
0
27 Mar 2025
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
A Multilingual, Culture-First Approach to Addressing Misgendering in LLM Applications
Sunayana Sitaram
Adrian de Wynter
Isobel McCrum
Qilong Gu
Si-Qing Chen
AILaw
163
0
0
26 Mar 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
158
2
0
25 Mar 2025
Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
Evaluating Bias in LLMs for Job-Resume Matching: Gender, Race, and Education
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
114
1
0
24 Mar 2025
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
ConvoGen: Enhancing Conversational AI with Synthetic Data: A Multi-Agent Approach
Reem Gody
Mahmoud Goudy
Ahmed Tawfik
SyDa
457
0
0
21 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
132
3
0
17 Mar 2025
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Unequal Opportunities: Examining the Bias in Geographical Recommendations by Large Language Models
Shiran Dudy
Thulasi Tholeti
R. Ramachandranpillai
Muhammad Ali
Toby Jia-Jun Li
Ricardo Baeza-Yates
117
2
0
16 Mar 2025
Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents
Interpretation Gaps in LLM-Assisted Comprehension of Privacy Documents
Rinku Dewri
75
0
0
15 Mar 2025
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
MentalChat16K: A Benchmark Dataset for Conversational Mental Health Assistance
Jia Xu
Tianyi Wei
Bojian Hou
Patryk Orzechowski
Shu Yang
Ruochen Jin
Rachael Paulbeck
Joost B. Wagenaar
George Demiris
Li Shen
AI4MH
81
1
0
13 Mar 2025
Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment
Zhenyu Liu
Dongfang Li
Xinshuo Hu
X. Zhao
Yibin Chen
Baotian Hu
Min Zhang
115
1
0
13 Mar 2025
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
How to Protect Yourself from 5G Radiation? Investigating LLM Responses to Implicit Misinformation
Ruohao Guo
Wei Xu
Alan Ritter
111
3
0
12 Mar 2025
WildIFEval: Instruction Following in the Wild
Gili Lior
Asaf Yehudai
Ariel Gera
L. Ein-Dor
148
0
0
09 Mar 2025
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding
Hao Wu
Yue Yang
Shiqi Jiang
Donglin Bai
Zhibo Chen
Ting Cao
403
1
0
08 Mar 2025
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
QG-SMS: Enhancing Test Item Analysis via Student Modeling and Simulation
Bang Nguyen
Tingting Du
Mengxia Yu
Lawrence Angrave
Meng Jiang
AI4Ed
111
0
0
07 Mar 2025
Development and Enhancement of Text-to-Image Diffusion Models
Rajdeep Roshan Sahu
VLM
162
44
0
07 Mar 2025
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang
Fangxin Wang
Philip S. Yu
Kaize Ding
Shixiang Zhu
AI4TS
192
9
0
06 Mar 2025
Personalized Generation In Large Model Era: A Survey
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Wenjie Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
197
8
0
04 Mar 2025
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Measuring What Makes You Unique: Difference-Aware User Modeling for Enhancing LLM Personalization
Yilun Qiu
Xiaoyan Zhao
Yang Zhang
Yimeng Bai
Wenjie Wang
Hong Cheng
Fuli Feng
Tat-Seng Chua
136
3
0
04 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DVRALM
224
3
0
03 Mar 2025
Argument Summarization and its Evaluation in the Era of Large Language Models
Argument Summarization and its Evaluation in the Era of Large Language Models
Moritz Altemeyer
Steffen Eger
Johannes Daxenberger
Yanran Chen
Tim Altendorf
Philipp Cimiano
Benjamin Schiller
LM&MAELMLRM
126
1
0
02 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
109
0
0
02 Mar 2025
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
A Pilot Empirical Study on When and How to Use Knowledge Graphs as Retrieval Augmented Generation
Xujie Yuan
Yongxu Liu
Shimin Di
Shiwen Wu
Libin Zheng
Rui Meng
Lei Chen
Xiaofang Zhou
Jian Yin
148
0
0
28 Feb 2025
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Learning to Align Multi-Faceted Evaluation: A Unified and Robust Framework
Kaishuai Xu
Tiezheng YU
Wenjun Hou
Yi Cheng
Liangyou Li
Xin Jiang
Lifeng Shang
Qiang Liu
Wenjie Li
ELM
159
0
0
26 Feb 2025
Independent Mobility GPT (IDM-GPT): A Self-Supervised Multi-Agent Large Language Model Framework for Customized Traffic Mobility Analysis Using Machine Learning Models
Independent Mobility GPT (IDM-GPT): A Self-Supervised Multi-Agent Large Language Model Framework for Customized Traffic Mobility Analysis Using Machine Learning Models
Fengze Yang
Xiaoyue Cathy Liu
Lingjiu Lu
Bingzhang Wang
Chenxi
90
1
0
25 Feb 2025
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
REGen: A Reliable Evaluation Framework for Generative Event Argument Extraction
Omar Sharif
Joseph Gatto
Madhusudan Basak
S. Preum
94
0
0
24 Feb 2025
PiCO: Peer Review in LLMs based on the Consistency Optimization
PiCO: Peer Review in LLMs based on the Consistency Optimization
Kun-Peng Ning
Shuo Yang
Yu-Yang Liu
Jia-Yu Yao
Zhen-Hui Liu
Yu Wang
Ming Pang
Li Yuan
ALM
217
9
0
24 Feb 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Xinze Wang
VLMLRM
234
2
0
22 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
142
4
0
21 Feb 2025
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Bosi Wen
Pei Ke
Yufei Sun
C. Wang
Xiaotao Gu
Jinfeng Zhou
Jie Tang
Hongning Wang
Minlie Huang
57
0
0
18 Feb 2025
Q-STRUM Debate: Query-Driven Contrastive Summarization for Recommendation Comparison
Q-STRUM Debate: Query-Driven Contrastive Summarization for Recommendation Comparison
George Saad
Scott Sanner
38
0
0
18 Feb 2025
You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations
You need to MIMIC to get FAME: Solving Meeting Transcript Scarcity with a Multi-Agent Conversations
Frederic Kirstein
Muneeb Khan
Jan Philip Wahle
Terry Ruas
Bela Gipp
58
0
0
18 Feb 2025
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu
Yanhao Li
Zongying Jiang
Yuxiong Jin
Li Dai
...
Weiyan Zhang
Yongqi Fan
Qi Ye
Jingping Liu
Tong Ruan
LM&MAELM
135
0
0
17 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
138
5
0
15 Feb 2025
Image Embedding Sampling Method for Diverse Captioning
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
95
0
0
14 Feb 2025
Previous
123456
Next