ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELM
    ALM
    LM&MA
ArXivPDFHTML

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 757 papers shown
Title
Towards More Effective Table-to-Text Generation: Assessing In-Context
  Learning and Self-Evaluation with Open-Source Models
Towards More Effective Table-to-Text Generation: Assessing In-Context Learning and Self-Evaluation with Open-Source Models
Sahar Iravani
Tim . O . F Conrad
LMTD
34
0
0
15 Oct 2024
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa
Hayate Iso
Nikita Bhutani
RALM
110
1
0
15 Oct 2024
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang
Yecheng Wu
Shang Yang
Enze Xie
Junsong Chen
Junyu Chen
Zhuoyang Zhang
Han Cai
Yaojie Lu
Song Han
74
34
0
14 Oct 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
65
35
0
14 Oct 2024
Language Model Preference Evaluation with Multiple Weak Evaluators
Language Model Preference Evaluation with Multiple Weak Evaluators
Zhengyu Hu
Jieyu Zhang
Zhihan Xiong
Alexander Ratner
Hui Xiong
Ranjay Krishna
51
3
0
14 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
46
3
0
14 Oct 2024
Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR
  Proceedings
Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings
Mojtaba Yousefi
Jack Collins
29
0
0
12 Oct 2024
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Beyond Exact Match: Semantically Reassessing Event Extraction by Large Language Models
Yi-Fan Lu
Xian-Ling Mao
Tian Lan
Heyan Huang
Heyan Huang
Xiaoyan Gao
55
0
0
12 Oct 2024
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLM
LRM
ELM
47
2
0
11 Oct 2024
Language Imbalance Driven Rewarding for Multilingual Self-improving
Language Imbalance Driven Rewarding for Multilingual Self-improving
Wen Yang
Junhong Wu
Chen Wang
Chengqing Zong
J.N. Zhang
ALM
LRM
74
4
0
11 Oct 2024
Do You Know What You Are Talking About? Characterizing Query-Knowledge
  Relevance For Reliable Retrieval Augmented Generation
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation
Zhuohang Li
Jiaxin Zhang
Chao Yan
Kamalika Das
Sricharan Kumar
Murat Kantarcioglu
Bradley Malin
RALM
26
1
0
10 Oct 2024
Increasing the Difficulty of Automatically Generated Questions via
  Reinforcement Learning with Synthetic Preference
Increasing the Difficulty of Automatically Generated Questions via Reinforcement Learning with Synthetic Preference
William Thorne
Ambrose Robinson
Bohua Peng
Chenghua Lin
Diana Maynard
16
2
0
10 Oct 2024
ReIFE: Re-evaluating Instruction-Following Evaluation
ReIFE: Re-evaluating Instruction-Following Evaluation
Yixin Liu
Kejian Shi
Alexander R. Fabbri
Yilun Zhao
Peifeng Wang
Chien-Sheng Wu
Shafiq Joty
Arman Cohan
30
6
0
09 Oct 2024
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Uncovering Factor Level Preferences to Improve Human-Model Alignment
Juhyun Oh
Eunsu Kim
Jiseon Kim
Wenda Xu
Inha Cha
William Yang Wang
Alice Oh
34
0
0
09 Oct 2024
AutoFeedback: An LLM-based Framework for Efficient and Accurate API
  Request Generation
AutoFeedback: An LLM-based Framework for Efficient and Accurate API Request Generation
Huanxi Liu
Jiaqi Liao
Dawei Feng
Kele Xu
Huaimin Wang
171
0
0
09 Oct 2024
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for
  Enhanced Following of Instructions with Multiple Constraints
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints
Thomas Palmeira Ferraz
Kartik Mehta
Yu-Hsiang Lin
Haw-Shiuan Chang
Shereen Oraby
Sijia Liu
Vivek Subramanian
Tagyoung Chung
Mohit Bansal
Nanyun Peng
56
8
0
09 Oct 2024
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates
Xiaosen Zheng
Tianyu Pang
Chao Du
Qian Liu
Jing Jiang
Min Lin
47
8
0
09 Oct 2024
Multi-Session Client-Centered Treatment Outcome Evaluation in
  Psychotherapy
Multi-Session Client-Centered Treatment Outcome Evaluation in Psychotherapy
Hongbin Na
Tao Shen
Shumao Yu
Ling Chen
30
2
0
08 Oct 2024
A Recipe For Building a Compliant Real Estate Chatbot
A Recipe For Building a Compliant Real Estate Chatbot
Navid Madani
Anusha Bagalkotkar
Supriya Anand
Gabriel Arnson
Rohini Srihari
K. Joseph
AI4TS
16
0
0
07 Oct 2024
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata
Saku Sugawara
LRM
39
3
0
07 Oct 2024
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang
Yufei Wang
Tiezheng YU
Yuxin Jiang
Chuhan Wu
...
Xin Jiang
Lifeng Shang
Ruiming Tang
Fuyuan Lyu
Chen Ma
31
4
0
07 Oct 2024
Realizing Video Summarization from the Path of Language-based Semantic
  Understanding
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
28
0
0
06 Oct 2024
SafeLLM: Domain-Specific Safety Monitoring for Large Language Models: A
  Case Study of Offshore Wind Maintenance
SafeLLM: Domain-Specific Safety Monitoring for Large Language Models: A Case Study of Offshore Wind Maintenance
Connor Walker
Callum Rothon
Koorosh Aslansefat
Y. Papadopoulos
Nina Dethlefs
28
0
0
06 Oct 2024
CS4: Measuring the Creativity of Large Language Models Automatically by
  Controlling the Number of Story-Writing Constraints
CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints
Anirudh Atmakuru
Jatin Nainani
Rohith Siddhartha Reddy Bheemreddy
Anirudh Lakkaraju
Zonghai Yao
Hamed Zamani
Haw-Shiuan Chang
116
2
0
05 Oct 2024
Computational Modeling of Artistic Inspiration: A Framework for
  Predicting Aesthetic Preferences in Lyrical Lines Using Linguistic and
  Stylistic Features
Computational Modeling of Artistic Inspiration: A Framework for Predicting Aesthetic Preferences in Lyrical Lines Using Linguistic and Stylistic Features
Gaurav Sahu
Olga Vechtomova
23
0
0
03 Oct 2024
Reward-RAG: Enhancing RAG with Reward Driven Supervision
Reward-RAG: Enhancing RAG with Reward Driven Supervision
Thang Nguyen
Peter Chin
Yu-Wing Tai
RALM
42
4
0
03 Oct 2024
CodeJudge: Evaluating Code Generation with Large Language Models
CodeJudge: Evaluating Code Generation with Large Language Models
Weixi Tong
Tianyi Zhang
ELM
ALM
39
8
0
03 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
42
8
0
03 Oct 2024
Comparing Criteria Development Across Domain Experts, Lay Users, and
  Models in Large Language Model Evaluation
Comparing Criteria Development Across Domain Experts, Lay Users, and Models in Large Language Model Evaluation
Annalisa Szymanski
Simret Araya Gebreegziabher
Oghenemaro Anuyah
Ronald A Metoyer
T. Li
ALM
ELM
40
7
0
02 Oct 2024
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
CALF: Benchmarking Evaluation of LFQA Using Chinese Examinations
Yuchen Fan
Xin Zhong
Heng Zhou
Yuchen Zhang
Mingyu Liang
Chengxing Xie
Ermo Hua
Ning Ding
Bowen Zhou
ALM
ELM
31
0
0
02 Oct 2024
From Facts to Insights: A Study on the Generation and Evaluation of
  Analytical Reports for Deciphering Earnings Calls
From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls
Tomas Goldsack
Yang Wang
Chenghua Lin
Chung-Chi Chen
20
2
0
01 Oct 2024
Khattat: Enhancing Readability and Concept Representation of Semantic
  Typography
Khattat: Enhancing Readability and Concept Representation of Semantic Typography
Ahmed Hussein
Alaa Elsetohy
Sama Hadhoud
Tameem Bakr
Yasser Rohaim
Badr AlKhamissi
VLM
34
0
0
01 Oct 2024
Creative and Context-Aware Translation of East Asian Idioms with GPT-4
Creative and Context-Aware Translation of East Asian Idioms with GPT-4
Kenan Tang
Peiyang Song
Yao Qin
Xifeng Yan
33
1
0
01 Oct 2024
Aligning Human and LLM Judgments: Insights from EvalAssist on
  Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences
Aligning Human and LLM Judgments: Insights from EvalAssist on Task-Specific Evaluations and AI-assisted Assessment Strategy Preferences
Zahra Ashktorab
Michael Desmond
Qian Pan
James M. Johnson
Martin Santillan Cooper
Elizabeth M. Daly
Rahul Nair
Tejaswini Pedapati
Swapnaja Achintalwar
Werner Geyer
ELM
54
5
0
01 Oct 2024
Mixed Chain-of-Psychotherapies for Emotional Support Chatbot
Mixed Chain-of-Psychotherapies for Emotional Support Chatbot
Siyuan Chen
Cong Ming
Zhiling Zhang
Yanyi Chen
Kenny Q. Zhu
Mengyue Wu
AI4MH
34
0
0
29 Sep 2024
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering
CoTKR: Chain-of-Thought Enhanced Knowledge Rewriting for Complex Knowledge Graph Question Answering
Yike Wu
Yi Huang
Nan Hu
Yuncheng Hua
Guilin Qi
Jiaoyan Chen
Jeff Z. Pan
44
7
0
29 Sep 2024
Data Analysis in the Era of Generative AI
Data Analysis in the Era of Generative AI
J. Inala
Chenglong Wang
Steven Drucker
Gonzalo Ramos
Victor C. Dibia
N. Riche
Dave Brown
Dan Marshall
Jianfeng Gao
32
8
0
27 Sep 2024
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
AXCEL: Automated eXplainable Consistency Evaluation using LLMs
P Aditya Sreekar
Sahil Verma
Suransh Chopra
Sarik Ghazarian
Abhishek Persad
Narayanan Sadagopan
LRM
36
0
0
25 Sep 2024
Training Language Models to Win Debates with Self-Play Improves Judge
  Accuracy
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy
Samuel Arnesen
David Rein
Julian Michael
ELM
41
3
0
25 Sep 2024
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Dialogue Applications
Sathya Krishnan Suresh
Wu Mengjun
Tushar Pranav
Eng Siong Chng
34
2
0
25 Sep 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large
  Language Models
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Haoran Que
Feiyu Duan
Liqun He
Yutao Mou
Wangchunshu Zhou
...
Ge Zhang
Junran Peng
Zhaoxiang Zhang
Songyang Zhang
Kai Chen
LM&MA
ELM
VLM
51
11
0
24 Sep 2024
Finetuning LLMs for Comparative Assessment Tasks
Finetuning LLMs for Comparative Assessment Tasks
Vatsal Raina
Adian Liusie
Mark Gales
32
1
0
24 Sep 2024
Direct Judgement Preference Optimization
Direct Judgement Preference Optimization
Peifeng Wang
Austin Xu
Yilun Zhou
Caiming Xiong
Shafiq Joty
ELM
39
12
0
23 Sep 2024
Beyond Persuasion: Towards Conversational Recommender System with
  Credible Explanations
Beyond Persuasion: Towards Conversational Recommender System with Credible Explanations
Peixin Qin
Chen Huang
Yang Deng
Wenqiang Lei
Tat-Seng Chua
LRM
37
3
0
22 Sep 2024
The Ability of Large Language Models to Evaluate Constraint-satisfaction
  in Agent Responses to Open-ended Requests
The Ability of Large Language Models to Evaluate Constraint-satisfaction in Agent Responses to Open-ended Requests
Lior Madmoni
Amir Zait
Ilia Labzovsky
Danny Karmon
ELM
33
0
0
22 Sep 2024
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic
  Post-Editing in LLM Translation Evaluators
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu
Liang Ding
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
35
3
0
22 Sep 2024
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on
  Curiosity-Driven Questioning
What Would You Ask When You First Saw a2+b2=c2a^2+b^2=c^2a2+b2=c2? Evaluating LLM on Curiosity-Driven Questioning
Shashidhar Reddy Javaji
Zining Zhu
ELM
ALM
39
0
0
19 Sep 2024
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
Guijin Son
Hyunwoo Ko
Hoyoung Lee
Yewon Kim
Seunghyeok Hong
ALM
ELM
54
7
0
17 Sep 2024
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation
  for Meeting Summarization
CREAM: Comparison-Based Reference-Free ELO-Ranked Automatic Evaluation for Meeting Summarization
Ziwei Gong
Lin Ai
Harshsaiprasad Deshpande
Alexander Johnson
Emmy Phung
Zehui Wu
Ahmad Emami
Julia Hirschberg
44
2
0
17 Sep 2024
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
Xinyue Fang
Zhen Huang
Zhiliang Tian
Minghui Fang
Ziyi Pan
Quntian Fang
Zhihua Wen
Hengyue Pan
Dongsheng Li
HILM
93
2
0
17 Sep 2024
Previous
123456...141516
Next