ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.16634
  4. Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment

29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
    ELM
    ALM
    LM&MA
ArXivPDFHTML

Papers citing "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

50 / 757 papers shown
Title
How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study
How Do Data Analysts Respond to AI Assistance? A Wizard-of-Oz Study
Ken Gu
Madeleine Grunde-McLaughlin
Andrew M. McNutt
Jeffrey Heer
Tim Althoff
42
29
0
18 Sep 2023
Summarization is (Almost) Dead
Summarization is (Almost) Dead
Xiao Pu
Mingqi Gao
Xiaojun Wan
HILM
81
39
0
18 Sep 2023
Embrace Divergence for Richer Insights: A Multi-document Summarization
  Benchmark and a Case Study on Summarizing Diverse Information from News
  Articles
Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles
Kung-Hsiang Huang
Philippe Laban
Alexander R. Fabbri
Prafulla Kumar Choubey
Chenyu You
Caiming Xiong
Chien-Sheng Wu
16
26
0
17 Sep 2023
ODSum: New Benchmarks for Open Domain Multi-Document Summarization
ODSum: New Benchmarks for Open Domain Multi-Document Summarization
Yijie Zhou
Kejian Shi
Wencai Zhang
Yixin Liu
Yilun Zhao
Arman Cohan
RALM
37
2
0
16 Sep 2023
Advancing the Evaluation of Traditional Chinese Language Models: Towards
  a Comprehensive Benchmark Suite
Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite
Chan-Jan Hsu
Chang-Le Liu
Feng-Ting Liao
Po-Chun Hsu
Yi-Chang Chen
Da-shan Shiu
ELM
ALM
22
12
0
15 Sep 2023
Investigating Answerability of LLMs for Long-Form Question Answering
Investigating Answerability of LLMs for Long-Form Question Answering
Meghana Moorthy Bhat
Rui Meng
Ye Liu
Yingbo Zhou
Semih Yavuz
21
10
0
15 Sep 2023
Bias in News Summarization: Measures, Pitfalls and Corpora
Bias in News Summarization: Measures, Pitfalls and Corpora
Julius Steen
Katja Markert
28
4
0
14 Sep 2023
Are Large Language Model-based Evaluators the Solution to Scaling Up
  Multilingual Evaluation?
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation?
Rishav Hada
Varun Gumma
Adrian de Wynter
Harshita Diddee
Mohamed Ahmed
Monojit Choudhury
Kalika Bali
Sunayana Sitaram
ALM
LM&MA
ELM
35
63
0
14 Sep 2023
Less is More for Long Document Summary Evaluation by LLMs
Less is More for Long Document Summary Evaluation by LLMs
Yunshu Wu
Hayate Iso
Pouya Pezeshkpour
Nikita Bhutani
Estevam R. Hruschka
24
34
0
14 Sep 2023
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation
  Suite for Large Language Models
BHASA: A Holistic Southeast Asian Linguistic and Cultural Evaluation Suite for Large Language Models
Wei Qi Leong
Jian Gang Ngui
Yosephine Susanto
Hamsawardhini Rengarajan
Kengatharaiyer Sarveswaran
William-Chandra Tjhi
29
9
0
12 Sep 2023
FaNS: a Facet-based Narrative Similarity Metric
FaNS: a Facet-based Narrative Similarity Metric
Mousumi Akter
Shubhra (Santu) Karmaker
25
1
0
09 Sep 2023
From Sparse to Dense: GPT-4 Summarization with Chain of Density
  Prompting
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting
Griffin Adams
Alexander R. Fabbri
Faisal Ladhak
Eric Lehman
Noémie Elhadad
32
53
0
08 Sep 2023
DoLa: Decoding by Contrasting Layers Improves Factuality in Large
  Language Models
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models
Yung-Sung Chuang
Yujia Xie
Hongyin Luo
Yoon Kim
James R. Glass
Pengcheng He
HILM
35
150
0
07 Sep 2023
Synthetic Text Generation using Hypergraph Representations
Synthetic Text Generation using Hypergraph Representations
Natraj Raman
Sameena Shah
12
1
0
06 Sep 2023
Automating Behavioral Testing in Machine Translation
Automating Behavioral Testing in Machine Translation
Javier Ferrando
Matthias Sperber
Hendra Setiawan
Dominic Telaar
Savsa Hasan
30
3
0
05 Sep 2023
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Data-Juicer: A One-Stop Data Processing System for Large Language Models
Daoyuan Chen
Yilun Huang
Zhijian Ma
Hesen Chen
Xuchen Pan
...
Zhaoyang Liu
Jinyang Gao
Yaliang Li
Bolin Ding
Jingren Zhou
SyDa
VLM
31
30
0
05 Sep 2023
Efficient RLHF: Reducing the Memory Usage of PPO
Efficient RLHF: Reducing the Memory Usage of PPO
Michael Santacroce
Yadong Lu
Han Yu
Yuan-Fang Li
Yelong Shen
35
27
0
01 Sep 2023
Publicly Shareable Clinical Large Language Model Built on Synthetic
  Clinical Notes
Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes
Sunjun Kweon
Junu Kim
Jiyoun Kim
Sujeong Im
Eunbyeol Cho
...
Seungjin Baek
Chang Hoon Han
Yoon Bin Jung
Yohan Jo
Edward Choi
LM&MA
ELM
17
38
0
01 Sep 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning
  Based on Visually Grounded Conversations
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
39
4
0
30 Aug 2023
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing
  Idiomatic Translation with Language Models
Translate Meanings, Not Just Words: IdiomKB's Role in Optimizing Idiomatic Translation with Language Models
Shuang Li
Jiangjie Chen
Siyu Yuan
Xinyi Wu
Hao Yang
Shimin Tao
Yanghua Xiao
48
15
0
26 Aug 2023
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
GPTEval: A Survey on Assessments of ChatGPT and GPT-4
Rui Mao
Guanyi Chen
Xulang Zhang
Frank Guerin
Min Zhang
ELM
LM&MA
38
101
0
24 Aug 2023
Knowledge Graph Prompting for Multi-Document Question Answering
Knowledge Graph Prompting for Multi-Document Question Answering
Yu-Chiang Frank Wang
Nedim Lipka
Ryan A. Rossi
Alexa F. Siu
Ruiyi Zhang
Tyler Derr
RALM
31
114
0
22 Aug 2023
Large Language Models on Wikipedia-Style Survey Generation: an
  Evaluation in NLP Concepts
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
Fan Gao
Hang Jiang
Rui Yang
Qingcheng Zeng
Jinghui Lu
Moritz Blum
Dairui Liu
Tianwei She
Yuang Jiang
Irene Z Li
ELM
ALM
LM&MA
35
8
0
21 Aug 2023
The Devil is in the Errors: Leveraging Large Language Models for
  Fine-grained Machine Translation Evaluation
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation
Patrick Fernandes
Daniel Deutsch
M. Finkelstein
Parker Riley
André F. T. Martins
Graham Neubig
Ankush Garg
J. Clark
Markus Freitag
Orhan Firat
LRM
36
68
0
14 Aug 2023
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan
Weize Chen
Yusheng Su
Jianxuan Yu
Wei Xue
Shan Zhang
Jie Fu
Zhiyuan Liu
ELM
LLMAG
ALM
34
447
0
14 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
71
119
0
14 Aug 2023
Three Ways of Using Large Language Models to Evaluate Chat
Three Ways of Using Large Language Models to Evaluate Chat
Ondvrej Plátek
Vojtvech Hudevcek
Patrícia Schmidtová
Mateusz Lango
Ondrej Dusek
ALM
19
6
0
12 Aug 2023
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
Víctor Gallego
SyDa
51
4
0
11 Aug 2023
Extrapolating Large Language Models to Non-English by Aligning Languages
Extrapolating Large Language Models to Non-English by Aligning Languages
Wenhao Zhu
Yunzhe Lv
Qingxiu Dong
Fei Yuan
Jingjing Xu
Shujian Huang
Lingpeng Kong
Jiajun Chen
Lei Li
45
66
0
09 Aug 2023
Shepherd: A Critic for Language Model Generation
Shepherd: A Critic for Language Model Generation
Tianlu Wang
Ping Yu
Xiaoqing Ellen Tan
Sean O'Brien
Ramakanth Pasunuru
Jane Dwivedi-Yu
O. Yu. Golovneva
Luke Zettlemoyer
Maryam Fazel-Zarandi
Asli Celikyilmaz
ALM
42
79
0
08 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
46
3
0
08 Aug 2023
PaniniQA: Enhancing Patient Education Through Interactive Question
  Answering
PaniniQA: Enhancing Patient Education Through Interactive Question Answering
Pengshan Cai
Zonghai Yao
Fei Liu
Dakuo Wang
Meghan Reilly
...
Yi Cao
Alok Kapoor
Adarsha S. Bajracharya
D. Berlowitz
Hongfeng Yu
38
18
0
07 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
60
618
0
04 Aug 2023
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive
  Summarization
Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization
Mousumi Akter
Shubhra (Santu) Karmaker
23
1
0
04 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
68
82
0
03 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
32
14
0
31 Jul 2023
Evaluating Correctness and Faithfulness of Instruction-Following Models
  for Question Answering
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Vaibhav Adlakha
Parishad BehnamGhader
Xing Han Lù
Nicholas Meade
Siva Reddy
33
120
0
31 Jul 2023
ARB: Advanced Reasoning Benchmark for Large Language Models
ARB: Advanced Reasoning Benchmark for Large Language Models
Tomohiro Sawada
Daniel Paleka
Alexander Havrilla
Pranav Tadepalli
Paula Vidas
Alexander Kranias
John J. Nay
Kshitij Gupta
Aran Komatsuzaki
ELM
LRM
45
37
0
25 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future
  Challenges
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
32
18
0
21 Jul 2023
MediaGPT : A Large Language Model For Chinese Media
MediaGPT : A Large Language Model For Chinese Media
Zhonghao Wang
Zijia Lu
Boshen Jin
Haiying Deng
LM&MA
35
0
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
46
99
0
20 Jul 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and
  Rule-Based Methods
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
21
33
0
16 Jul 2023
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise
  Comparisons using Large Language Models
LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models
Adian Liusie
Potsawee Manakul
Mark Gales
ELM
29
35
0
15 Jul 2023
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Text Alignment Is An Efficient Unified Model for Massive NLP Tasks
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
ALM
22
9
0
06 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Yufan Zhou
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
27
219
0
29 Jun 2023
Mitigating Hallucination in Large Multi-Modal Models via Robust
  Instruction Tuning
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
40
244
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
38
13
0
22 Jun 2023
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
Ran Zhang
Jihed Ouni
Steffen Eger
32
6
0
22 Jun 2023
Open-Domain Text Evaluation via Contrastive Distribution Methods
Open-Domain Text Evaluation via Contrastive Distribution Methods
Sidi Lu
Hongyi Liu
Asli Celikyilmaz
Tianlu Wang
Nanyun Peng
31
0
0
20 Jun 2023
Democratizing LLMs for Low-Resource Languages by Leveraging their
  English Dominant Abilities with Linguistically-Diverse Prompts
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts
Xuan-Phi Nguyen
Sharifah Mahani Aljunied
Chenyu You
Lidong Bing
23
32
0
20 Jun 2023
Previous
123...13141516
Next