Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 757 papers shown
Title
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
Qing Guo
VLM
LRM
82
2
0
22 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
60
3
0
21 Feb 2025
An LLM-Based Approach for Insight Generation in Data Analysis
Alberto Sánchez Pérez
Alaa Boukhary
Paolo Papotti
Luis Castejón Lozano
Adam Elwood
44
0
0
20 Feb 2025
CMQCIC-Bench: A Chinese Benchmark for Evaluating Large Language Models in Medical Quality Control Indicator Calculation
Guangya Yu
Yanhao Li
Zongying Jiang
Yuxiong Jin
Li Dai
...
Weiyan Zhang
Yongqi Fan
Qi Ye
Jingping Liu
Tong Ruan
LM&MA
ELM
77
0
0
17 Feb 2025
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie
Qingqiu Li
Zhuohao Yu
Yuejie Zhang
Yue Zhang
Linyi Yang
ELM
65
1
0
15 Feb 2025
Image Embedding Sampling Method for Diverse Captioning
Sania Waheed
Na Min An
62
0
0
14 Feb 2025
Cost-Saving LLM Cascades with Early Abstention
Michael J. Zellinger
Rex Liu
Matt Thomson
111
0
0
13 Feb 2025
Improve LLM-based Automatic Essay Scoring with Linguistic Features
Zhaoyi Joey Hou
Alejandro Ciuba
Xiang Lorraine Li
57
1
0
13 Feb 2025
HuDEx: Integrating Hallucination Detection and Explainability for Enhancing the Reliability of LLM responses
Sujeong Lee
Hayoung Lee
Seongsoo Heo
Wonik Choi
HILM
93
0
0
12 Feb 2025
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights
Ahilan Ayyachamy Nadar Ponnusamy
66
0
0
11 Feb 2025
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
82
0
0
07 Feb 2025
CondAmbigQA: A Benchmark and Dataset for Conditional Ambiguous Question Answering
Zongxi Li
Y. Li
Haoran Xie
S. J. Qin
70
0
0
03 Feb 2025
LLM-Powered Benchmark Factory: Reliable, Generic, and Efficient
Peiwen Yuan
Shaoxiong Feng
Yiwei Li
Xiaobei Wang
Y. Zhang
Jiayi Shi
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
73
2
0
02 Feb 2025
LLM-based Affective Text Generation Quality Based on Different Quantization Values
Yarik Menchaca Resendiz
Roman Klinger
MQ
101
1
0
31 Jan 2025
FactCG: Enhancing Fact Checkers with Graph-Based Multi-Hop Data
Deren Lei
Yaxi Li
Siyao Li
Mengya Hu
Rui Xu
Ken Archer
Mingyu Wang
Emily Ching
Alex Deng
SyDa
HILM
LRM
73
1
0
28 Jan 2025
SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Hossein A. Rahmani
Xi Wang
Emine Yilmaz
Nick Craswell
Bhaskar Mitra
Paul Thomas
87
4
0
28 Jan 2025
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
4
0
28 Jan 2025
Training Dialogue Systems by AI Feedback for Improving Overall Dialogue Impression
Kai Yoshida
M. Mizukami
Seiya Kawano
Canasai Kruengkrai
Hiroaki Sugiyama
Koichiro Yoshino
ALM
OffRL
84
1
0
28 Jan 2025
Breaking the Stigma! Unobtrusively Probe Symptoms in Depression Disorder Diagnosis Dialogue
Jieming Cao
Chen Huang
Y. Zhang
Ruibo Deng
Jincheng Zhang
Wenqiang Lei
35
0
0
28 Jan 2025
MDEval: Evaluating and Enhancing Markdown Awareness in Large Language Models
Zhongpu Chen
Y. Liu
Long Shi
Zhi-Jie Wang
Xingyan Chen
Yu Zhao
Fuji Ren
46
1
0
28 Jan 2025
Federated Retrieval Augmented Generation for Multi-Product Question Answering
Parshin Shojaee
Sai Sree Harsha
Dan Luo
Akash Maharaj
Tong Yu
Yunyao Li
46
3
0
28 Jan 2025
Beyond correlation: The Impact of Human Uncertainty in Measuring the Effectiveness of Automatic Evaluation and LLM-as-a-Judge
Aparna Elangovan
Jongwoo Ko
Lei Xu
Mahsa Elyasi
Ling Liu
S. Bodapati
Dan Roth
52
6
0
28 Jan 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation
Takyoung Kim
Kyungjae Lee
Y. Jang
Ji Yong Cho
Gangwoo Kim
Minseok Cho
Moontae Lee
159
0
0
28 Jan 2025
Generating Plausible Distractors for Multiple-Choice Questions via Student Choice Prediction
Yooseop Lee
Suin Kim
Yohan Jo
AI4Ed
63
2
0
21 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
141
70
0
20 Jan 2025
PASS: Presentation Automation for Slide Generation and Speech
Tushar Aggarwal
Aarohi Bhand
76
1
0
17 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jingyang Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
90
19
0
17 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
54
0
0
12 Jan 2025
CodEv: An Automated Grading Framework Leveraging Large Language Models for Consistent and Constructive Feedback
En-Qi Tseng
Pei-Cing Huang
Chan Hsu
Peng-Yi Wu
Chan-Tung Ku
Yihuang Kang
41
1
0
10 Jan 2025
Cascaded Self-Evaluation Augmented Training for Lightweight Multimodal LLMs
Zheqi Lv
Wenkai Wang
Jiawei Wang
Shengyu Zhang
Fei Wu
LRM
ReLM
51
0
0
10 Jan 2025
Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation
Alireza Salemi
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Weize Kong
Tao Chen
Zhuowan Li
Michael Bendersky
Hamed Zamani
LRM
RALM
ReLM
52
6
0
07 Jan 2025
Multi-LLM Collaborative Caption Generation in Scientific Documents
Jaeyoung Kim
J. B. Lee
Hong-Jun Choi
Ting-Yao Hsu
Chieh-Yang Huang
...
Ryan Rossi
Tong Yu
C. Lee Giles
Ting-Hao 'Kenneth' Huang
S. Choi
23
2
0
05 Jan 2025
CaseSumm: A Large-Scale Dataset for Long-Context Summarization from U.S. Supreme Court Opinions
Mourad Heddaya
Kyle MacMillan
Anup Malani
Hongyuan Mei
Chenhao Tan
AILaw
ELM
34
0
0
03 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice Oh
Seohyon Jung
94
1
0
03 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
70
96
0
03 Jan 2025
LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts
Helia Hashemi
J. Eisner
Corby Rosset
Benjamin Van Durme
Chris Kedzie
68
2
0
03 Jan 2025
Evaluate Summarization in Fine-Granularity: Auto Evaluation with LLM
Dong Yuan
Eti Rastogi
Fen Zhao
Sagar Goyal
Gautam Naik
Sree Prasanna Rajagopal
44
0
0
31 Dec 2024
Context-Aware Deep Learning for Multi Modal Depression Detection
Genevieve Lam
Huang Dongyan
Weisi Lin
38
80
0
26 Dec 2024
MapExplorer: New Content Generation from Low-Dimensional Visualizations
Xingjian Zhang
Ziyang Xiong
Shixuan Liu
Yutong Xie
Tolga Ergen
Dongsub Shim
Hua Xu
Honglak Lee
Qiaozhu Me
41
0
0
24 Dec 2024
Revisiting In-Context Learning with Long Context Language Models
Jinheon Baek
Sun Jae Lee
Prakhar Gupta
Geunseob
Oh
Siddharth Dalmia
235
1
0
22 Dec 2024
Hansel: Output Length Controlling Framework for Large Language Models
Seoha Song
Junhyun Lee
Hyeonmok Ko
75
0
0
18 Dec 2024
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o
Tony Cheng Tong
Sirui He
Z. Shao
Dit-Yan Yeung
75
3
0
18 Dec 2024
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja
Vivek Iyer
Claire He
Graham Neubig
ViT
92
1
0
18 Dec 2024
MOPO: Multi-Objective Prompt Optimization for Affective Text Generation
Yarik Menchaca Resendiz
Roman Klinger
74
1
0
17 Dec 2024
EventSum: A Large-Scale Event-Centric Summarization Dataset for Chinese Multi-News Documents
Mengna Zhu
Kaisheng Zeng
Mao Wang
Kaiming Xiao
Lei Hou
Hongbin Huang
Juanzi Li
239
1
0
16 Dec 2024
QUENCH: Measuring the gap between Indic and Non-Indic Contextual General Reasoning in LLMs
Mohammad Aflah Khan
Neemesh Yadav
Sarah Masud
Md. Shad Akhtar
74
0
0
16 Dec 2024
Leveraging Large Language Models for Active Merchant Non-player Characters
Byungjun Kim
Minju Kim
Dayeon Seo
Bugeun Kim
87
0
0
15 Dec 2024
QAPyramid: Fine-grained Evaluation of Content Selection for Text Summarization
Shiyue Zhang
David Wan
Arie Cattan
Ayal Klein
Ido Dagan
Joey Tianyi Zhou
86
0
0
10 Dec 2024
Copyright-Protected Language Generation via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
82
1
0
09 Dec 2024
Previous
1
2
3
4
5
6
...
14
15
16
Next