ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
v1v2v3 (latest)

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELMALMLM&MA
ArXiv (abs)PDFHTMLGithub (344★)

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 264 papers shown
Title
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jun Xu
Jirong Wen
KELM
97
8
0
25 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
177
74
0
23 May 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with
  Minimal Impact on Coherence and Evasiveness in Dialogue Agents
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
131
3
0
21 May 2024
Presentations are not always linear! GNN meets LLM for
  Document-to-Presentation Transformation with Attribution
Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution
Himanshu Maheshwari
Sambaran Bandyopadhyay
Aparna Garimella
Anandhavelu Natarajan
41
5
0
21 May 2024
A Survey of Automatic Hallucination Evaluation on Natural Language Generation
A Survey of Automatic Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Yulan He
Zheng Yuan
LRMHILM
107
1
0
18 Apr 2024
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek
S. Jauhar
Silviu Cucerzan
Sung Ju Hwang
AI4CELLMAGLM&Ro
126
56
0
11 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems
  with System-centric and User-centric Factors
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
122
8
0
04 Apr 2024
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended
  Text Evaluation
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Yu Li
Shenyu Zhang
Rui Wu
Xiutian Huang
Yongrui Chen
Wenhao Xu
Guilin Qi
Dehai Min
LLMAG
69
11
0
28 Mar 2024
Automatic Interactive Evaluation for Large Language Models with State
  Aware Patient Simulator
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Yusheng Liao
Yutong Meng
Yuhao Wang
Hongcheng Liu
Yanfeng Wang
Yu Wang
LM&MAELM
69
9
0
13 Mar 2024
Improving Socratic Question Generation using Data Augmentation and
  Preference Optimization
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew Lan
115
9
0
01 Mar 2024
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
188
38
0
01 Mar 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELMLRM
160
2
0
27 Feb 2024
Evaluating the Performance of ChatGPT for Spam Email Detection
Evaluating the Performance of ChatGPT for Spam Email Detection
Shijing Si
Yuwei Wu
Jiawen Gu
Yugui Zhang
Jedrek Wosik
Qinliang Su
136
9
0
23 Feb 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large
  Language Models
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
66
14
0
21 Feb 2024
A Multi-Aspect Framework for Counter Narrative Evaluation using Large
  Language Models
A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models
Jaylen Jones
Lingbo Mo
Eric Fosler-Lussier
Huan Sun
104
4
0
18 Feb 2024
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics
  for Domain Specialized Text Analysis
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Shaochen Xu
Zihao Wu
Huaqin Zhao
Peng Shu
Zheng Liu
Wenxiong Liao
Sheng Li
Andrea Sikora
Tianming Liu
Xiang Li
101
17
0
17 Feb 2024
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito
Kihyuk Sohn
Chen-Yu Lee
Yoshitaka Ushiku
150
3
0
16 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELMLM&MA
224
41
0
02 Feb 2024
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large
  Language Models
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRMELM
51
22
0
30 Jan 2024
INACIA: Integrating Large Language Models in Brazilian Audit Courts:
  Opportunities and Challenges
INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges
J. Pereira
Andre Assumpcao
J. Trecenti
Luiz Airosa
Caio Lente
Jhonatan Cléto
Guilherme Dobins
Rodrigo Nogueira
Luis Mitchell
R. Lotufo
67
2
0
10 Jan 2024
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Mandar Kulkarni
Praveen Tangarajan
Kyung Kim
Anusua Trivedi
OffRLRALMSILM
70
30
0
10 Jan 2024
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and
  Improvement of Large Language Models
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Wendi Cui
Jiaxin Zhang
Zhuohang Li
Lopez Damien
Kamalika Das
Sricharan Kumar
Kumar Sricharan
69
2
0
04 Jan 2024
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALMELM
69
15
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
106
133
0
11 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context
  Learning
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
111
32
0
29 Nov 2023
End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding
End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding
Kwanyoung Kim
Y. Oh
S. Park
H. Byun
Joongyo Lee
Jin Sung Kim
Yong Bae Kim
Jong Chul Ye
126
0
0
27 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision
  Language Models in Open-Ended Video Question Answering
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELMMLLM
100
27
0
25 Nov 2023
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Aditi Jha
Sam Havens
Jeremey Dohmann
Alex Trott
Jacob P. Portes
ALM
50
11
0
22 Nov 2023
On Evaluating the Integration of Reasoning and Action in LLM Agents with
  Database Question Answering
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering
Linyong Nan
Ellen Zhang
Weijin Zou
Yilun Zhao
Wenfei Zhou
Arman Cohan
LLMAG
100
14
0
16 Nov 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative
  Understanding
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
82
5
0
28 Oct 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li
Yiran Liu
Xingxing Zhang
Wei Lu
Furu Wei
ALM
83
3
0
20 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for
  Text Generation
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
86
3
0
17 Oct 2023
Compositional preference models for aligning LMs
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
97
20
0
17 Oct 2023
FiLM: Fill-in Language Models for Any-Order Generation
FiLM: Fill-in Language Models for Any-Order Generation
Tianxiao Shen
Hao-Chun Peng
Ruoqi Shen
Yao Fu
Zaïd Harchaoui
Yejin Choi
95
10
0
15 Oct 2023
Prompting Large Language Models with Chain-of-Thought for Few-Shot
  Knowledge Base Question Generation
Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
Yuanyuan Liang
Jianing Wang
Hanlun Zhu
Lei Wang
Weining Qian
Yunshi Lan
LRMReLM
88
39
0
12 Oct 2023
Automatic and Human-AI Interactive Text Generation
Automatic and Human-AI Interactive Text Generation
Yao Dou
Philippe Laban
Claire Gardent
Wei Xu
82
4
0
05 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable
  Autonomous Driving
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
127
211
0
03 Oct 2023
Calibrating LLM-Based Evaluator
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
122
33
0
23 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MAELMAI4MH
145
78
0
21 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated
  Jailbreak Prompts
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
232
353
0
19 Sep 2023
Investigating Answerability of LLMs for Long-Form Question Answering
Investigating Answerability of LLMs for Long-Form Question Answering
Meghana Moorthy Bhat
Rui Meng
Ye Liu
Yingbo Zhou
Semih Yavuz
75
11
0
15 Sep 2023
Large Language Models on Wikipedia-Style Survey Generation: an
  Evaluation in NLP Concepts
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
Fan Gao
Hang Jiang
Rui Yang
Qingcheng Zeng
Jinghui Lu
Moritz Blum
Dairui Liu
Tianwei She
Yuang Jiang
Irene Li
ELMALMLM&MA
93
9
0
21 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
163
4
0
08 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and
  Rule-Based Methods
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
66
35
0
16 Jul 2023
CMMLU: Measuring massive multitask language understanding in Chinese
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALMELM
140
274
0
15 Jun 2023
Large Language Models as Tax Attorneys: A Case Study in Legal
  Capabilities Emergence
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
John J. Nay
David Karamardian
Sarah Lawsky
Wenting Tao
Meghana Moorthy Bhat
Raghav Jain
Aaron Travis Lee
Jonathan H. Choi
Jungo Kasai
ELMAILaw
113
60
0
12 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALMELM
117
149
0
07 Jun 2023
On Optimal Caching and Model Multiplexing for Large Model Inference
On Optimal Caching and Model Multiplexing for Large Model Inference
Banghua Zhu
Ying Sheng
Lianmin Zheng
Clark W. Barrett
Michael I. Jordan
Jiantao Jiao
99
21
0
03 Jun 2023
UniChart: A Universal Vision-language Pretrained Model for Chart
  Comprehension and Reasoning
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq Joty
LRM
95
113
0
24 May 2023
Previous
123456
Next