Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 765 papers shown
Title
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Yusheng Liao
Yutong Meng
Yuhao Wang
Hongcheng Liu
Yanfeng Wang
Yu Wang
LM&MA
ELM
43
8
0
13 Mar 2024
CoIN: A Benchmark of Continual Instruction tuNing for Multimodel Large Language Model
Cheng Chen
Sitong Su
Xu Luo
Hengtao Shen
Lianli Gao
Jingkuan Song
CLL
42
13
0
13 Mar 2024
Debatrix: Multi-dimensional Debate Judge with Iterative Chronological Analysis Based on LLM
Jingcong Liang
Rong Ye
Meng Han
Ruofei Lai
Xinyu Zhang
Xuanjing Huang
Zhongyu Wei
45
6
0
12 Mar 2024
Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation
Juan Manuel Zambrano Chaves
Shih-Cheng Huang
Yanbo Xu
Hanwen Xu
Naoto Usuyama
...
Akshay S. Chaudhari
Serena Yeung-Levy
Curtis P. Langlotz
Sheng Wang
Hoifung Poon
VLM
LM&MA
73
10
0
12 Mar 2024
Thread Detection and Response Generation using Transformers with Prompt Optimisation
Kevin Joshua T
Arnav Agarwal
Shriya Sanjay
Yash Sarda
John Sahaya Rani Alex
Saurav Gupta
Sushant Kumar
Vishwanath Kamath
13
2
0
09 Mar 2024
On the Benefits of Fine-Grained Loss Truncation: A Case Study on Factuality in Summarization
Lorenzo Jaime Yu Flores
Arman Cohan
HILM
46
2
0
09 Mar 2024
LLM4Decompile: Decompiling Binary Code with Large Language Models
Hanzhuo Tan
Qi Luo
Jing Li
Yuqun Zhang
SyDa
ELM
65
20
0
08 Mar 2024
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
40
4
0
06 Mar 2024
Evidence-Focused Fact Summarization for Knowledge-Augmented Zero-Shot Question Answering
Sungho Ko
Hyunjin Cho
Hyungjoo Chae
Jinyoung Yeo
Dongha Lee
RALM
HILM
24
7
0
05 Mar 2024
ChatCite: LLM Agent with Human Workflow Guidance for Comparative Literature Summary
Yutong Li
Lu Chen
Aiwei Liu
Kai Yu
Lijie Wen
34
19
0
05 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
55
2
0
04 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
34
8
0
04 Mar 2024
Ever-Evolving Memory by Blending and Refining the Past
Seo Hyun Kim
Keummin Ka
Yohan Jo
Seung-won Hwang
Dongha Lee
Jinyoung Yeo
KELM
39
1
0
03 Mar 2024
Attribute Structuring Improves LLM-Based Evaluation of Clinical Text Summaries
Zelalem Gero
Chandan Singh
Yiqing Xie
Sheng Zhang
Tristan Naumann
Jianfeng Gao
Hoifung Poon
ELM
ALM
39
4
0
01 Mar 2024
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
Xiao Yu
Yunan Lu
Zhou Yu
RALM
42
6
0
01 Mar 2024
DiaHalu: A Dialogue-level Hallucination Evaluation Benchmark for Large Language Models
Kedi Chen
Qin Chen
Jie Zhou
Yishen He
Liang He
HILM
43
1
0
01 Mar 2024
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew Lan
40
8
0
01 Mar 2024
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
76
33
0
01 Mar 2024
Small But Funny: A Feedback-Driven Approach to Humor Distillation
Sahithya Ravi
Patrick Huber
Akshat Shrivastava
Aditya Sagar
Ahmed Aly
Vered Shwartz
Arash Einolghozati
44
5
0
28 Feb 2024
A Sentiment Consolidation Framework for Meta-Review Generation
Miao Li
Jey Han Lau
Eduard Hovy
22
4
0
28 Feb 2024
AmbigNLG: Addressing Task Ambiguity in Instruction for NLG
Ayana Niwa
Hayate Iso
36
4
0
27 Feb 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
58
2
0
27 Feb 2024
CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models
Huijie Lv
Xiao Wang
Yuan Zhang
Caishuang Huang
Shihan Dou
Junjie Ye
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
44
29
0
26 Feb 2024
Navigating Complexity: Orchestrated Problem Solving with Multi-Agent LLMs
Sumedh Rasal
E. Hauer
32
0
0
26 Feb 2024
Retrieval Augmented Generation Systems: Automatic Dataset Creation, Evaluation and Boolean Agent Setup
Tristan Kenneweg
Philip Kenneweg
Barbara Hammer
3DV
48
4
0
26 Feb 2024
Chain-of-Discussion: A Multi-Model Framework for Complex Evidence-Based Question Answering
Mingxu Tao
Dongyan Zhao
Yansong Feng
LLMAG
49
3
0
26 Feb 2024
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Sunjun Kweon
Jiyoun Kim
Heeyoung Kwak
Dongchul Cha
Hangyul Yoon
Kwanghyun Kim
Jeewon Yang
Seunghyun Won
Edward Choi
LM&MA
38
4
0
25 Feb 2024
An Empirical Study of Challenges in Machine Learning Asset Management
Zhimin Zhao
Yihao Chen
A. A. Bangash
Bram Adams
Ahmed E. Hassan
42
6
0
25 Feb 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
45
4
0
25 Feb 2024
HD-Eval: Aligning Large Language Model Evaluators Through Hierarchical Criteria Decomposition
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
39
13
0
24 Feb 2024
Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization
Swaroop Nath
Tejpalsingh Siledar
Sankara Sri Raghava Ravindra Muddu
Rupasai Rangaraju
H. Khadilkar
...
Suman Banerjee
Amey Patil
Sudhanshu Singh
M. Chelliah
Nikesh Garera
50
0
0
23 Feb 2024
Evaluating the Performance of ChatGPT for Spam Email Detection
Shijing Si
Yuwei Wu
Jiawen Gu
Yugui Zhang
Jedrek Wosik
Qinliang Su
59
8
0
23 Feb 2024
Leveraging Large Language Models for Concept Graph Recovery and Question Answering in NLP Education
Rui Yang
Boming Yang
Sixun Ouyang
Tianwei She
Aosong Feng
Yuang Jiang
Freddy Lecue
Jinghui Lu
Irene Z Li
AI4Ed
39
5
0
22 Feb 2024
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
Vyas Raina
Adian Liusie
Mark Gales
AAML
ELM
37
54
0
21 Feb 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
51
13
0
21 Feb 2024
CriticBench: Evaluating Large Language Models as Critic
Tian Lan
Wenwei Zhang
Chen Xu
Heyan Huang
Dahua Lin
Kai-xiang Chen
Xian-Ling Mao
ELM
AI4MH
LRM
52
3
0
21 Feb 2024
Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?
Alexander Arno Weber
Klaudia Thellmann
Jan Ebert
Nicolas Flores-Herr
Jens Lehmann
Michael Fromm
Mehdi Ali
43
4
0
21 Feb 2024
TreeEval: Benchmark-Free Evaluation of Large Language Models through Tree Planning
Xiang Li
Yunshi Lan
Chao Yang
ELM
46
8
0
20 Feb 2024
Exploring the Impact of Table-to-Text Methods on Augmenting LLM-based Question Answering with Domain Hybrid Data
Dehai Min
Nan Hu
Rihui Jin
Nuo Lin
Jiaoyan Chen
...
Yu Li
Guilin Qi
Yun Li
Nijun Li
Qianren Wang
LMTD
33
14
0
20 Feb 2024
Identifying Factual Inconsistencies in Summaries: Grounding Model Inference via Task Taxonomy
Liyan Xu
Zhenlin Su
Mo Yu
Jin Xu
Jinho D. Choi
Jie Zhou
Fei Liu
HILM
45
2
0
20 Feb 2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
AAML
ELM
51
22
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
51
13
0
19 Feb 2024
One Prompt To Rule Them All: LLMs for Opinion Summary Evaluation
Tejpalsingh Siledar
Swaroop Nath
Sankara Sri Raghava Ravindra Muddu
Rupasai Rangaraju
Swaprava Nath
...
Suman Banerjee
Amey Patil
Sudhanshu Singh
M. Chelliah
Nikesh Garera
ALM
LRM
41
7
0
18 Feb 2024
A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models
Jaylen Jones
Lingbo Mo
Eric Fosler-Lussier
Huan Sun
59
3
0
18 Feb 2024
FactPICO: Factuality Evaluation for Plain Language Summarization of Medical Evidence
Sebastian Antony Joseph
Lily Chen
Jan Trienes
Hannah Louisa Göke
Monika Coers
Wei Xu
Byron C. Wallace
Junyi Jessy Li
LM&MA
HILM
26
10
0
18 Feb 2024
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Shaochen Xu
Zihao Wu
Huaqin Zhao
Peng Shu
Zheng Liu
Wenxiong Liao
Sheng Li
Andrea Sikora
Tianming Liu
Xiang Li
29
17
0
17 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
76
18
0
16 Feb 2024
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito
Kihyuk Sohn
Chen-Yu Lee
Yoshitaka Ushiku
66
2
0
16 Feb 2024
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning
Ming Li
Lichang Chen
Jiuhai Chen
Shwai He
Jiuxiang Gu
Dinesh Manocha
31
53
0
15 Feb 2024
T-RAG: Lessons from the LLM Trenches
M. Fatehkia
J. Lucas
Sanjay Chawla
LLMAG
40
21
0
12 Feb 2024
Previous
1
2
3
...
9
10
11
...
14
15
16
Next