ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
v1v2v3 (latest)

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELMALMLM&MA
ArXiv (abs)PDFHTMLGithub (344★)

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 264 papers shown
Title
DistillNote: LLM-based clinical note summaries improve heart failure diagnosis
DistillNote: LLM-based clinical note summaries improve heart failure diagnosis
Heloisa Oss Boll
Antonio Oss Boll
Leticia Puttlitz Boll
Ameen Abu-Hanna
Iacer Calixto
24
0
0
20 Jun 2025
Reranking-based Generation for Unbiased Perspective Summarization
Reranking-based Generation for Unbiased Perspective Summarization
Narutatsu Ri
Nicholas Deas
Kathleen McKeown
OffRL
24
0
0
19 Jun 2025
MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs
MinosEval: Distinguishing Factoid and Non-Factoid for Tailored Open-Ended QA Evaluation with LLMs
Yongqi Fan
Yating Wang
Guandong Wang
Jie Zhai
Jingping Liu
Qi Ye
Tong Ruan
26
0
0
18 Jun 2025
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics
Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics
Silvia Casola
Yang Liu
Siyao Peng
Oliver Kraus
Albert Gatt
Barbara Plank
23
0
0
17 Jun 2025
Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Doppelganger Method: Breaking Role Consistency in LLM Agent via Prompt-based Transferable Adversarial Attack
Daewon Kang
YeongHwan Shin
Doyeon Kim
Kyu-Hwan Jung
Meong Hi Son
AAMLSILM
75
0
0
17 Jun 2025
GenerationPrograms: Fine-grained Attribution with Executable Programs
GenerationPrograms: Fine-grained Attribution with Executable Programs
David Wan
Eran Hirsch
Elias Stengel-Eskin
Ido Dagan
Mohit Bansal
30
0
0
17 Jun 2025
From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents
From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents
Seongbo Jang
Minjin Jeon
Jaehoon Lee
Seonghyeon Lee
Dongha Lee
Hwanjo Yu
39
0
0
17 Jun 2025
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis
Pengzuo Wu
Yuhang Yang
Guangcheng Zhu
Chao Ye
Hong Gu
...
Y. He
Liangyu Zha
Wentao Ye
Junbo Zhao
Haobo Wang
LMTD
18
0
0
16 Jun 2025
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models
Qiyue Yin
Pei Xu
Qiaozhe Li
Shengda Liu
S. Shen
...
Lei Cui
Chengxin Yan
Jie Sun
Xiangquan Tang
K. Huang
LLMAGELMLRM
121
0
0
12 Jun 2025
Latent Multi-Head Attention for Small Language Models
Latent Multi-Head Attention for Small Language Models
Sushant Mehta
Raj Abhijit Dandekar
Rajat Dandekar
Sreedath Panat
RALM
51
0
0
11 Jun 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yukun Ma
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
AuLLM
91
0
0
11 Jun 2025
DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs
Arie Cattan
Alon Jacovi
Ori Ram
Jonathan Herzig
Roee Aharoni
Sasha Goldshtein
E. Ofek
Idan Szpektor
Avi Caciularu
42
0
0
10 Jun 2025
Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Exploring the Impact of Temperature on Large Language Models:Hot or Cold?
Lujun Li
Lama Sleem
Niccolo Gentile
Geoffrey Nichil
Radu State
21
0
0
08 Jun 2025
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques
Generating Grounded Responses to Counter Misinformation via Learning Efficient Fine-Grained Critiques
Xiaofei Xu
Xiuzhen Zhang
Ke Deng
HILM
56
0
0
06 Jun 2025
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Proactive Assistant Dialogue Generation from Streaming Egocentric Videos
Yichi Zhang
Xin Luna Dong
Zhaojiang Lin
Andrea Madotto
Anuj Kumar
Babak Damavandi
J. Chai
Seungwhan Moon
72
0
0
06 Jun 2025
Elementary Math Word Problem Generation using Large Language Models
Elementary Math Word Problem Generation using Large Language Models
Nimesh Ariyarathne
Harshani Bandara
Yasith Heshan
Omega Gamage
Surangika Ranathunga
...
Gayathri Lihinikaduarachchi
Tharoosha Vihidun
Meenambika Chandirakumar
Sanujen Premakumar
Sanjula Gathsara
AI4Ed
76
0
0
06 Jun 2025
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
What Is Seen Cannot Be Unseen: The Disruptive Effect of Knowledge Conflict on Large Language Models
Kaiser Sun
Fan Bai
Mark Dredze
21
0
0
06 Jun 2025
ProRefine: Inference-time Prompt Refinement with Textual Feedback
Deepak Pandita
Tharindu Cyril Weerasooriya
A. Shah
Christopher Homan
Wei Wei
LLMAGReLMLRM
153
0
0
05 Jun 2025
Do Large Language Models Judge Error Severity Like Humans?
Do Large Language Models Judge Error Severity Like Humans?
Diege Sun
Guanyi Chen
Zhao Fan
Xiaorong Cheng
Tingting He
190
0
0
05 Jun 2025
QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering
QQSUM: A Novel Task and Model of Quantitative Query-Focused Summarization for Review-based Product Question Answering
A. Tang
Xiuzhen Zhang
M. Dinh
Zhuang Li
RALM
69
0
0
04 Jun 2025
EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing
EssayBench: Evaluating Large Language Models in Multi-Genre Chinese Essay Writing
Fan Gao
Dongyuan Li
Ding Xia
Fei Mi
Yasheng Wang
Lifeng Shang
Baojun Wang
ELM
42
0
0
03 Jun 2025
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework
Zhaorui Yang
Bo Pan
Han Wang
Yiyao Wang
Xingyu Liu
Minfeng Zhu
Bo Zhang
Wei Chen
58
0
0
03 Jun 2025
Labelling Data with Unknown References
Labelling Data with Unknown References
Adrian de Wynter
87
0
0
03 Jun 2025
PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements
PAKTON: A Multi-Agent Framework for Question Answering in Long Legal Agreements
Petros Raptopoulos
Giorgos Filandrianos
Maria Lymperaiou
Giorgos Stamou
AILaw
62
0
0
31 May 2025
Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages
Towards Multi-dimensional Evaluation of LLM Summarization across Domains and Languages
Hyangsuk Min
Yuho Lee
Minjeong Ban
Jiaqi Deng
Nicole Hee-Yeon Kim
Taewon Yun
Hang Su
Jason (Jinglun) Cai
Hwanjun Song
ELM
31
0
0
31 May 2025
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation
AnnaAgent: Dynamic Evolution Agent System with Multi-Session Memory for Realistic Seeker Simulation
Ming Wang
Peidong Wang
Lin Wu
Xiaocui Yang
Daling Wang
Shi Feng
Yuxin Chen
B. Wang
Yifei Zhang
50
0
0
31 May 2025
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
LegalEval-Q: A New Benchmark for The Quality Evaluation of LLM-Generated Legal Text
Li yunhan
Wu gengshen
AILawELMALM
34
0
0
30 May 2025
LaMP-QA: A Benchmark for Personalized Long-form Question Answering
LaMP-QA: A Benchmark for Personalized Long-form Question Answering
Alireza Salemi
Hamed Zamani
24
0
0
30 May 2025
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
ARC: Argument Representation and Coverage Analysis for Zero-Shot Long Document Summarization with Instruction Following LLMs
Mohamed S. Elaraby
Diane Litman
LLMAG
40
0
0
29 May 2025
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics
Ran Zhang
Mohannad Elhamod
87
0
0
29 May 2025
MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models
MEF: A Capability-Aware Multi-Encryption Framework for Evaluating Vulnerabilities in Black-Box Large Language Models
Mingyu Yu
Wei Wang
Y. X. Wei
Sujuan Qin
Fei Gao
Wenmin Li
AAML
42
0
0
29 May 2025
StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs
StrucSum: Graph-Structured Reasoning for Long Document Extractive Summarization with LLMs
Haohan Yuan
Sukhwa Hong
Haopeng Zhang
RALMReLMLRM
60
0
0
29 May 2025
Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users
Does Johnny Get the Message? Evaluating Cybersecurity Notifications for Everyday Users
V. Jüttner
Erik Buchmann
38
0
0
28 May 2025
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators
John Mendonça
A. Lavie
Isabel Trancoso
56
0
0
28 May 2025
Pre-Training Curriculum for Multi-Token Prediction in Language Models
Pre-Training Curriculum for Multi-Token Prediction in Language Models
Ansar Aynetdinov
Alan Akbik
LRM
57
0
0
28 May 2025
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
BiasFilter: An Inference-Time Debiasing Framework for Large Language Models
Xiaoqing Cheng
Ruizhe Chen
Hongying Zan
Yuxiang Jia
Min Peng
43
1
0
28 May 2025
Simple and Effective Baselines for Code Summarisation Evaluation
Simple and Effective Baselines for Code Summarisation Evaluation
Jade Robinson
Jonathan K. Kummerfeld
103
0
0
26 May 2025
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Joao Coelho
Jingjie Ning
Jingyuan He
Kangrui Mao
Abhijay Paladugu
...
Jiahe Jin
Jamie Callan
João Magalhães
Bruno Martins
Chenyan Xiong
90
2
0
25 May 2025
Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge
Assistant-Guided Mitigation of Teacher Preference Bias in LLM-as-a-Judge
Zhuo Liu
Moxin Li
Xun Deng
Qifan Wang
Fuli Feng
ELM
74
0
0
25 May 2025
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing
Weihan Xu
Yimeng Ma
Jingyue Huang
Yang Li
Wenye Ma
Taylor Berg-Kirkpatrick
Julian McAuley
Paul Pu Liang
Hao-Wen Dong
DiffMVGen
184
0
0
24 May 2025
Understanding How Value Neurons Shape the Generation of Specified Values in LLMs
Yi Su
Jiayi Zhang
Shu Yang
Xinhai Wang
Lijie Hu
Di Wang
OffRL
206
2
0
23 May 2025
Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement
Two-way Evidence self-Alignment based Dual-Gated Reasoning Enhancement
Kexin Zhang
Junlan Chen
Daifeng Li
Yuxuan Zhang
Yangyang Feng
Bowen Deng
Weixu Chen
LRM
82
0
0
22 May 2025
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Long-Form Information Alignment Evaluation Beyond Atomic Facts
Danna Zheng
Mirella Lapata
Jeff Z. Pan
HILM
72
0
0
21 May 2025
Adaptive Plan-Execute Framework for Smart Contract Security Auditing
Adaptive Plan-Execute Framework for Smart Contract Security Auditing
Zhiyuan Wei
Jing Sun
Zijian Zhang
Zhe Hou
Zixiao Zhao
196
0
0
21 May 2025
Enhancing Abstractive Summarization of Scientific Papers Using Structure Information
Enhancing Abstractive Summarization of Scientific Papers Using Structure Information
Tong Bao
Heng Zhang
Chengzhi Zhang
212
3
0
20 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
116
0
0
20 May 2025
R3: Robust Rubric-Agnostic Reward Models
R3: Robust Rubric-Agnostic Reward Models
David Anugraha
Zilu Tang
Lester James V. Miranda
Hanyang Zhao
Mohammad Rifqi Farhansyah
Garry Kuwanto
Derry Wijaya
Genta Indra Winata
222
1
0
19 May 2025
From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
From Recall to Reasoning: Automated Question Generation for Deeper Math Learning through Large Language Models
Yongan Yu
Alexandre Krantz
Nikki G. Lobczowski
LRM
69
0
0
17 May 2025
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
LM&MAELM
127
0
0
17 May 2025
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects
Tobias Preintner
Weixuan Yuan
Qi Huang
Adrian König
Thomas Bäck
Elena Raponi
Niki van Stein
91
0
0
09 May 2025
123456
Next