Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 765 papers shown
Title
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming
Jiaxu Liu
Xiangyu Yin
Sihao Wu
Jianhong Wang
Meng Fang
Xinping Yi
Xiaowei Huang
34
5
0
21 May 2024
Tailoring Vaccine Messaging with Common-Ground Opinions
Rickard Stureborg
Sanxing Chen
Ruoyu Xie
Aayushi Patel
Christopher Li
Chloe Qinyu Zhu
Tingnan Hu
Jun Yang
Bhuwan Dhingra
44
0
0
17 May 2024
DEBATE: Devil's Advocate-Based Assessment and Text Evaluation
Alex G. Kim
Keonwoo Kim
Sangwon Yoon
ELM
32
5
0
16 May 2024
PHUDGE: Phi-3 as Scalable Judge
Mahesh Deshwal
Apoorva Chawla
ALM
27
0
0
12 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark Gales
43
10
0
09 May 2024
Large Language Models are Inconsistent and Biased Evaluators
Rickard Stureborg
Dimitris Alikaniotis
Yoshi Suhara
ALM
47
54
0
02 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
56
174
0
02 May 2024
RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization
Dongqi Pu
Vera Demberg
53
5
0
01 May 2024
Text Quality-Based Pruning for Efficient Training of Language Models
Vasu Sharma
Karthik Padthe
Newsha Ardalani
Kushal Tirumala
Russell Howes
...
Po-Yao Huang
Shang-Wen Li
Armen Aghajanyan
Gargi Ghosh
Luke Zettlemoyer
54
6
0
26 Apr 2024
CEval: A Benchmark for Evaluating Counterfactual Text Generation
Van Bach Nguyen
Jorg Schlotterer
Christin Seifert
39
6
0
26 Apr 2024
CASPR: Automated Evaluation Metric for Contrastive Summarization
Nirupan Ananthamurugan
Dat Duong
Philip George
Ankita Gupta
Sandeep Tata
Beliz Gunel
32
0
0
23 Apr 2024
IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents
Jean-Philippe Corbeil
32
3
0
23 Apr 2024
Retrieval Augmented Generation for Domain-specific Question Answering
Sanat Sharma
David Seunghyun Yoon
Franck Dernoncourt
Dewang Sultania
Karishma Bagga
Mengjiao Zhang
Trung Bui
Varun Kotte
RALM
46
9
0
23 Apr 2024
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing
Dujian Ding
Ankur Mallick
Chi Wang
Robert Sim
Subhabrata Mukherjee
Victor Rühle
L. Lakshmanan
Ahmed Hassan Awadallah
101
81
0
22 Apr 2024
LLMChain: Blockchain-based Reputation System for Sharing and Evaluating Large Language Models
Mouhamed Amine Bouchiha
Quentin Telnoff
Souhail Bakkali
R. Champagnat
Mourad Rabah
Mickael Coustaty
Y. Ghamri-Doudane
LRM
42
3
0
20 Apr 2024
Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation
Lasal Jayawardena
Prasan Yapa
BDL
43
1
0
19 Apr 2024
FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom
Yuanqin He
Yan Kang
Lixin Fan
Qiang Yang
35
3
0
18 Apr 2024
Can We Catch the Elephant? A Survey of the Evolvement of Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Zheng Yuan
LRM
HILM
54
1
0
18 Apr 2024
ParaFusion: A Large-Scale LLM-Driven English Paraphrase Dataset Infused with High-Quality Lexical and Syntactic Diversity
Lasal Jayawardena
Prasan Yapa
16
5
0
18 Apr 2024
Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
Sunhao Dai
Chen Xu
Shicheng Xu
Liang Pang
Zhenhua Dong
Jun Xu
50
67
0
17 Apr 2024
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs
Jaehyung Kim
Jaehyun Nam
Sangwoo Mo
Jongjin Park
Sang-Woo Lee
Minjoon Seo
Jung-Woo Ha
Jinwoo Shin
AIFin
RALM
ELM
45
35
0
17 Apr 2024
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
Dayeon Ki
Marine Carpuat
43
17
0
11 Apr 2024
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek
S. Jauhar
Silviu Cucerzan
Sung Ju Hwang
AI4CE
LLMAG
LM&Ro
42
39
0
11 Apr 2024
Less is More for Improving Automatic Evaluation of Factual Consistency
Tong Wang
Ninad Kulkarni
Yanjun Qi
ALM
49
2
0
09 Apr 2024
Comparing Two Model Designs for Clinical Note Generation; Is an LLM a Useful Evaluator of Consistency?
Nathan Brake
Thomas Schaaf
32
3
0
09 Apr 2024
Understanding Cross-Lingual Alignment -- A Survey
Katharina Hämmerl
Jindvrich Libovický
Alexander Fraser
43
10
0
09 Apr 2024
Evaluating LLMs at Detecting Errors in LLM Responses
Ryo Kamoi
Sarkar Snigdha Sarathi Das
Renze Lou
Jihyun Janice Ahn
Yilun Zhao
...
Salika Dave
Shaobo Qin
Arman Cohan
Wenpeng Yin
Rui Zhang
44
21
0
04 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
49
6
0
04 Apr 2024
METAL: Towards Multilingual Meta-Evaluation
Rishav Hada
Varun Gumma
Mohamed Ahmed
Kalika Bali
Sunayana Sitaram
ELM
48
2
0
02 Apr 2024
Large Language Models are Capable of Offering Cognitive Reappraisal, if Guided
Hongli Zhan
Allen Zheng
Yoon Kyung Lee
Jina Suh
Junyi Jessy Li
Desmond C. Ong
AI4MH
54
8
0
01 Apr 2024
LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation
Zilong Wang
Xufang Luo
Xinyang Jiang
Dongsheng Li
Lili Qiu
LM&MA
32
8
0
01 Apr 2024
PSYDIAL: Personality-based Synthetic Dialogue Generation using Large Language Models
Ji-Eun Han
Jun-Seok Koh
Hyeon-Tae Seo
Du-Seong Chang
Kyung-Ah Sohn
34
7
0
01 Apr 2024
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation
Michael Hassid
Tal Remez
Jonas Gehring
Roy Schwartz
Yossi Adi
36
20
0
31 Mar 2024
CoUDA: Coherence Evaluation via Unified Data Augmentation
Dawei Zhu
Wenhao Wu
Yifan Song
Fangwei Zhu
Ziqiang Cao
Sujian Li
30
0
0
31 Mar 2024
Measuring Taiwanese Mandarin Language Understanding
Po-Heng Chen
Sijia Cheng
Wei-Lin Chen
Yen-Ting Lin
Yun-Nung Chen
ELM
54
2
0
29 Mar 2024
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Yu Li
Shenyu Zhang
Rui Wu
Xiutian Huang
Yongrui Chen
Wenhao Xu
Guilin Qi
Dehai Min
LLMAG
16
9
0
28 Mar 2024
SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation
Dongqi Pu
Yifan Wang
Jia E. Loy
Vera Demberg
31
6
0
26 Mar 2024
REFeREE: A REference-FREE Model-Based Metric for Text Simplification
Yichen Huang
Ekaterina Kochmar
60
1
0
26 Mar 2024
Large Language Models Are State-of-the-Art Evaluator for Grammatical Error Correction
Masamune Kobayashi
Masato Mita
Mamoru Komachi
ELM
47
3
0
26 Mar 2024
Aligning Large Language Models for Enhancing Psychiatric Interviews Through Symptom Delineation and Summarization: Pilot Study
Jae-hee So
Joonhwan Chang
Eunji Kim
Junho Na
JiYeon Choi
Jy-yong Sohn
Byung-Hoon Kim
Sang Hui Chu
LM&MA
AI4MH
29
2
0
26 Mar 2024
Enhanced Facet Generation with LLM Editing
Joosung Lee
Jinhong Kim
29
2
0
25 Mar 2024
FEEL: A Framework for Evaluating Emotional Support Capability with Large Language Models
Huaiwen Zhang
Yu Chen
Ming Wang
Shi Feng
40
2
0
23 Mar 2024
Multi-Review Fusion-in-Context
Aviv Slobodkin
Ori Shapira
Ran Levy
Ido Dagan
170
1
0
22 Mar 2024
Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference
Baolin Li
Yankai Jiang
V. Gadepally
Devesh Tiwari
34
15
0
19 Mar 2024
WoLF: Wide-scope Large Language Model Framework for CXR Understanding
Seil Kang
Donghyun Kim
Junhyeok Kim
Hyo Kyung Lee
Seong Jae Hwang
48
2
0
19 Mar 2024
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners
Chi Hu
Yuan Ge
Xiangnan Ma
Hang Cao
Qiang Li
Yonghua Yang
Tong Xiao
Jingbo Zhu
ReLM
ELM
LRM
ALM
45
9
0
19 Mar 2024
Reference-based Metrics Disprove Themselves in Question Generation
Bang Nguyen
Mengxia Yu
Yun Huang
Meng Jiang
HILM
36
2
0
18 Mar 2024
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models
Kung-Hsiang Huang
Hou Pong Chan
Yi R. Fung
Haoyi Qiu
Mingyang Zhou
Shafiq Joty
Shih-Fu Chang
Chenhui Xu
AI4TS
74
18
0
18 Mar 2024
GPT-4 as Evaluator: Evaluating Large Language Models on Pest Management in Agriculture
Shanglong Yang
Zhipeng Yuan
Shunbao Li
Ruoling Peng
Kang Liu
Po Yang
ELM
LM&MA
56
6
0
18 Mar 2024
ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning
Ahmed Masry
Mehrad Shahmohammadi
Md. Rizwan Parvez
Enamul Hoque
Shafiq Joty
50
31
0
14 Mar 2024
Previous
1
2
3
...
8
9
10
...
14
15
16
Next