Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
v1
v2
v3 (latest)
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv (abs)
PDF
HTML
Github (344★)
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 264 papers shown
Title
Towards Completeness-Oriented Tool Retrieval for Large Language Models
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jun Xu
Jirong Wen
KELM
97
8
0
25 May 2024
AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents
Christopher Rawles
Sarah Clinckemaillie
Yifan Chang
Jonathan Waltz
Gabrielle Lau
...
Daniel Toyama
Robert Berry
Divya Tyamagundlu
Timothy Lillicrap
Oriana Riva
LLMAG
177
74
0
23 May 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
131
3
0
21 May 2024
Presentations are not always linear! GNN meets LLM for Document-to-Presentation Transformation with Attribution
Himanshu Maheshwari
Sambaran Bandyopadhyay
Aparna Garimella
Anandhavelu Natarajan
41
5
0
21 May 2024
A Survey of Automatic Hallucination Evaluation on Natural Language Generation
Siya Qi
Yulan He
Yulan He
Zheng Yuan
LRM
HILM
107
1
0
18 Apr 2024
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models
Jinheon Baek
S. Jauhar
Silviu Cucerzan
Sung Ju Hwang
AI4CE
LLMAG
LM&Ro
126
56
0
11 Apr 2024
Concept -- An Evaluation Protocol on Conversational Recommender Systems with System-centric and User-centric Factors
Chen Huang
Peixin Qin
Yang Deng
Wenqiang Lei
Jiancheng Lv
Tat-Seng Chua
122
8
0
04 Apr 2024
MATEval: A Multi-Agent Discussion Framework for Advancing Open-Ended Text Evaluation
Yu Li
Shenyu Zhang
Rui Wu
Xiutian Huang
Yongrui Chen
Wenhao Xu
Guilin Qi
Dehai Min
LLMAG
69
11
0
28 Mar 2024
Automatic Interactive Evaluation for Large Language Models with State Aware Patient Simulator
Yusheng Liao
Yutong Meng
Yuhao Wang
Hongcheng Liu
Yanfeng Wang
Yu Wang
LM&MA
ELM
69
9
0
13 Mar 2024
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew Lan
115
9
0
01 Mar 2024
Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
Chantal Shaib
Joe Barrow
Jiuding Sun
Alexa F. Siu
Byron C. Wallace
A. Nenkova
188
38
0
01 Mar 2024
Re-Ex: Revising after Explanation Reduces the Factual Errors in LLM Responses
Juyeon Kim
Jeongeun Lee
Yoonho Chang
Chanyeol Choi
Junseong Kim
Jy-yong Sohn
KELM
LRM
160
2
0
27 Feb 2024
Evaluating the Performance of ChatGPT for Spam Email Detection
Shijing Si
Yuwei Wu
Jiawen Gu
Yugui Zhang
Jedrek Wosik
Qinliang Su
136
9
0
23 Feb 2024
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models
Chenyang Lyu
Minghao Wu
Alham Fikri Aji
ELM
66
14
0
21 Feb 2024
A Multi-Aspect Framework for Counter Narrative Evaluation using Large Language Models
Jaylen Jones
Lingbo Mo
Eric Fosler-Lussier
Huan Sun
104
4
0
18 Feb 2024
Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Shaochen Xu
Zihao Wu
Huaqin Zhao
Peng Shu
Zheng Liu
Wenxiong Liao
Sheng Li
Andrea Sikora
Tianming Liu
Xiang Li
101
17
0
17 Feb 2024
Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction
Kuniaki Saito
Kihyuk Sohn
Chen-Yu Lee
Yoshitaka Ushiku
150
3
0
16 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
224
41
0
02 Feb 2024
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
ELM
51
22
0
30 Jan 2024
INACIA: Integrating Large Language Models in Brazilian Audit Courts: Opportunities and Challenges
J. Pereira
Andre Assumpcao
J. Trecenti
Luiz Airosa
Caio Lente
Jhonatan Cléto
Guilherme Dobins
Rodrigo Nogueira
Luis Mitchell
R. Lotufo
67
2
0
10 Jan 2024
Reinforcement Learning for Optimizing RAG for Domain Chatbots
Mandar Kulkarni
Praveen Tangarajan
Kyung Kim
Anusua Trivedi
OffRL
RALM
SILM
70
30
0
10 Jan 2024
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
Wendi Cui
Jiaxin Zhang
Zhuohang Li
Lopez Damien
Kamalika Das
Sricharan Kumar
Kumar Sricharan
69
2
0
04 Jan 2024
LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Yue Zhang
Ming Zhang
Haipeng Yuan
Shichun Liu
Yongyao Shi
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
ELM
69
15
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
106
133
0
11 Dec 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
Kevin Qinghong Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
111
32
0
29 Nov 2023
End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding
Kwanyoung Kim
Y. Oh
S. Park
H. Byun
Joongyo Lee
Jin Sung Kim
Yong Bae Kim
Jong Chul Ye
126
0
0
27 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELM
MLLM
100
27
0
25 Nov 2023
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Aditi Jha
Sam Havens
Jeremey Dohmann
Alex Trott
Jacob P. Portes
ALM
50
11
0
22 Nov 2023
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering
Linyong Nan
Ellen Zhang
Weijin Zou
Yilun Zhao
Wenfei Zhou
Arman Cohan
LLMAG
100
14
0
16 Nov 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
82
5
0
28 Oct 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li
Yiran Liu
Xingxing Zhang
Wei Lu
Furu Wei
ALM
83
3
0
20 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
86
3
0
17 Oct 2023
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
97
20
0
17 Oct 2023
FiLM: Fill-in Language Models for Any-Order Generation
Tianxiao Shen
Hao-Chun Peng
Ruoqi Shen
Yao Fu
Zaïd Harchaoui
Yejin Choi
95
10
0
15 Oct 2023
Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
Yuanyuan Liang
Jianing Wang
Hanlun Zhu
Lei Wang
Weining Qian
Yunshi Lan
LRM
ReLM
88
39
0
12 Oct 2023
Automatic and Human-AI Interactive Text Generation
Yao Dou
Philippe Laban
Claire Gardent
Wei Xu
82
4
0
05 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
127
211
0
03 Oct 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
122
33
0
23 Sep 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
145
78
0
21 Sep 2023
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
Jiahao Yu
Xingwei Lin
Zheng Yu
Xinyu Xing
SILM
232
353
0
19 Sep 2023
Investigating Answerability of LLMs for Long-Form Question Answering
Meghana Moorthy Bhat
Rui Meng
Ye Liu
Yingbo Zhou
Semih Yavuz
75
11
0
15 Sep 2023
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts
Fan Gao
Hang Jiang
Rui Yang
Qingcheng Zeng
Jinghui Lu
Moritz Blum
Dairui Liu
Tianwei She
Yuang Jiang
Irene Li
ELM
ALM
LM&MA
93
9
0
21 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
163
4
0
08 Aug 2023
Wider and Deeper LLM Networks are Fairer LLM Evaluators
Xinghua Zhang
Yu Bowen
Haiyang Yu
Yangyu Lv
Tingwen Liu
Fei Huang
Hongbo Xu
Yongbin Li
ALM
146
90
0
03 Aug 2023
Assessing the Quality of Multiple-Choice Questions Using GPT-4 and Rule-Based Methods
Steven Moore
H. A. Nguyen
Tianying Chen
John C. Stamper
ELM
66
35
0
16 Jul 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
140
274
0
15 Jun 2023
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
John J. Nay
David Karamardian
Sarah Lawsky
Wenting Tao
Meghana Moorthy Bhat
Raghav Jain
Aaron Travis Lee
Jonathan H. Choi
Jungo Kasai
ELM
AILaw
113
60
0
12 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
117
149
0
07 Jun 2023
On Optimal Caching and Model Multiplexing for Large Model Inference
Banghua Zhu
Ying Sheng
Lianmin Zheng
Clark W. Barrett
Michael I. Jordan
Jiantao Jiao
99
21
0
03 Jun 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Shafiq Joty
LRM
95
113
0
24 May 2023
Previous
1
2
3
4
5
6
Next