Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 763 papers shown
Title
AlignBench: Benchmarking Chinese Alignment of Large Language Models
Xiao Liu
Xuanyu Lei
Sheng-Ping Wang
Yue Huang
Zhuoer Feng
...
Hongning Wang
Jing Zhang
Minlie Huang
Yuxiao Dong
Jie Tang
ELM
LM&MA
ALM
125
43
0
30 Nov 2023
CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation
Pei Ke
Bosi Wen
Andrew Feng
Xiao-Yang Liu
Xuanyu Lei
...
Aohan Zeng
Yuxiao Dong
Hongning Wang
Jie Tang
Minlie Huang
ELM
ALM
50
25
0
30 Nov 2023
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang
K. Lin
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
VGen
31
28
0
29 Nov 2023
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
42
25
0
29 Nov 2023
Universal Self-Consistency for Large Language Model Generation
Xinyun Chen
Renat Aksitov
Uri Alon
Jie Jessie Ren
Kefan Xiao
Pengcheng Yin
Sushant Prakash
Charles Sutton
Xuezhi Wang
Denny Zhou
LRM
26
66
0
29 Nov 2023
StyleCap: Automatic Speaking-Style Captioning from Speech Based on Speech and Language Self-supervised Learning Models
Kazuki Yamauchi
Yusuke Ijima
Yuki Saito
41
8
0
28 Nov 2023
End-to-End Breast Cancer Radiotherapy Planning via LMMs with Consistency Embedding
Kwanyoung Kim
Y. Oh
S. Park
H. Byun
Joongyo Lee
Jin Sung Kim
Yong Bae Kim
Jong Chul Ye
31
0
0
27 Nov 2023
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen
Yuan Lin
Yuchen Zhang
Weiran Huang
ELM
MLLM
31
26
0
25 Nov 2023
Minimizing Factual Inconsistency and Hallucination in Large Language Models
Muneeswaran Irulandi
Shreya Saxena
Siva Prasad
M. V. Sai Prakash
Advaith Shankar
V. Varun
Vishal Vaddina
Saisubramaniam Gopalakrishnan
HILM
37
5
0
23 Nov 2023
Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting
Daphne van Zandvoort
Laura Wiersema
Tom Huibers
S. Dulmen
S. Brinkkemper
LM&MA
MedIm
28
7
0
22 Nov 2023
LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Aditi Jha
Sam Havens
Jeremey Dohmann
Alex Trott
Jacob P. Portes
ALM
24
11
0
22 Nov 2023
Generating Valid and Natural Adversarial Examples with Large Language Models
Zimu Wang
Wei Wang
Qi Chen
Qiufeng Wang
Anh Nguyen
AAML
23
4
0
20 Nov 2023
Exploring Prompting Large Language Models as Explainable Metrics
Ghazaleh Mahmoudi
LRM
19
4
0
20 Nov 2023
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models
Yimin Jing
Renren Jin
Jiahao Hu
Huishi Qiu
Xiaohua Wang
Peng Wang
Deyi Xiong
LRM
ELM
30
1
0
16 Nov 2023
Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks
Yuxuan Lu
Bingsheng Yao
Shao Zhang
Yun Wang
Peng Zhang
Tun Lu
Toby Jia-Jun Li
Dakuo Wang
ALM
42
19
0
16 Nov 2023
How Far Can We Extract Diverse Perspectives from Large Language Models?
Shirley Anugrah Hayati
Minhwa Lee
Dheeraj Rajagopal
Dongyeop Kang
40
10
0
16 Nov 2023
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores
Yiqi Liu
N. Moosavi
Chenghua Lin
ELM
35
46
0
16 Nov 2023
GEO: Generative Engine Optimization
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
46
2
0
16 Nov 2023
On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering
Linyong Nan
Ellen Zhang
Weijin Zou
Yilun Zhao
Wenfei Zhou
Arman Cohan
LLMAG
46
13
0
16 Nov 2023
DocLens: Multi-aspect Fine-grained Evaluation for Medical Text Generation
Yiqing Xie
Sheng Zhang
Hao Cheng
Pengfei Liu
Zelalem Gero
Cliff Wong
Tristan Naumann
Hoifung Poon
Carolyn Rose
MedIm
28
4
0
16 Nov 2023
Prompt-based Pseudo-labeling Strategy for Sample-Efficient Semi-Supervised Extractive Summarization
Gaurav Sahu
Olga Vechtomova
I. Laradji
39
1
0
16 Nov 2023
AMRFact: Enhancing Summarization Factuality Evaluation with AMR-Driven Negative Samples Generation
Haoyi Qiu
Kung-Hsiang Huang
Jingnong Qu
Nanyun Peng
HILM
30
6
0
16 Nov 2023
Fusion-Eval: Integrating Assistant Evaluators with LLMs
Lei Shu
Nevan Wichers
Liangchen Luo
Yun Zhu
Yinxiao Liu
Jindong Chen
Lei Meng
ELM
15
3
0
15 Nov 2023
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models
Haoan Jin
Siyuan Chen
Dilawaier Dilixiati
Yewei Jiang
Mengyue Wu
Ke Zhu
ELM
AI4MH
LM&MA
51
4
0
15 Nov 2023
Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization
Yixin Liu
Alexander R. Fabbri
Jiawen Chen
Yilun Zhao
Simeng Han
Chenyu You
Pengfei Liu
Dragomir R. Radev
Chien-Sheng Wu
Arman Cohan
ELM
46
59
0
15 Nov 2023
How Well Do Large Language Models Truly Ground?
Hyunji Lee
Se June Joo
Chaeeun Kim
Joel Jang
Doyoung Kim
Kyoung-Woon On
Minjoon Seo
HILM
41
6
0
15 Nov 2023
Exploring the Potential of Large Language Models in Computational Argumentation
Guizhen Chen
Liying Cheng
Anh Tuan Luu
Lidong Bing
LLMAG
LRM
29
23
0
15 Nov 2023
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Minqian Liu
Ying Shen
Zhiyang Xu
Yixin Cao
Eunah Cho
Vaibhav Kumar
Reza Ghanadan
Lifu Huang
ELM
LM&MA
ALM
52
25
0
15 Nov 2023
How You Prompt Matters! Even Task-Oriented Constraints in Instructions Affect LLM-Generated Text Detection
Ryuto Koike
Masahiro Kaneko
Naoaki Okazaki
DeLMO
27
6
0
14 Nov 2023
Controlled Text Generation for Black-box Language Models via Score-based Progressive Editor
Sangwon Yu
Changmin Lee
Hojin Lee
Sungroh Yoon
29
0
0
13 Nov 2023
Coffee: Boost Your Code LLMs by Fixing Bugs with Feedback
Seungjun Moon
Hyungjoo Chae
Yongho Song
Taeyoon Kwon
Dongjin Kang
Kai Tzu-iunn Ong
Seung-won Hwang
Jinyoung Yeo
KELM
23
11
0
13 Nov 2023
Explanation-aware Soft Ensemble Empowers Large Language Model In-context Learning
Yue Yu
Jiaming Shen
Tianqi Liu
Zhen Qin
Jing Nathan Yan
Jialu Liu
Chao Zhang
Michael Bendersky
54
6
0
13 Nov 2023
Large Language Models are Zero Shot Hypothesis Proposers
Biqing Qi
Kaiyan Zhang
Haoxiang Li
Kai Tian
Sihang Zeng
Zhang-Ren Chen
Bowen Zhou
32
28
0
10 Nov 2023
Which is better? Exploring Prompting Strategy For LLM-based Metrics
Joonghoon Kim
Saeran Park
Kiyoon Jeong
Sangmin Lee
S. Han
Jiyoon Lee
Pilsung Kang
20
16
0
07 Nov 2023
AI-TA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Yann Hicke
Anmol Agarwal
Qianou Ma
Paul Denny
AI4Ed
42
24
0
05 Nov 2023
FloodBrain: Flood Disaster Reporting by Web-based Retrieval Augmented Generation with an LLM
Grace Colverd
Paul Darm
Leonard Silverberg
Noah Kasmanoff
37
16
0
05 Nov 2023
Grounded Intuition of GPT-Vision's Abilities with Scientific Images
Alyssa Hwang
Andrew Head
Chris Callison-Burch
48
3
0
03 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELM
LM&MA
49
13
0
03 Nov 2023
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
Xinlu Zhang
Yujie Lu
Weizhi Wang
An Yan
Jun Yan
Lianke Qin
Heng Wang
Xifeng Yan
William Y. Wang
Linda R. Petzold
LM&MA
MLLM
ELM
30
75
0
02 Nov 2023
Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task
Neema Kotonya
Saran Krishnasamy
Joel R. Tetreault
Alejandro Jaimes
24
9
0
01 Nov 2023
BioInstruct: Instruction Tuning of Large Language Models for Biomedical Natural Language Processing
Hieu Tran
Zhichao Yang
Zonghai Yao
Hong-ye Yu
ALM
LM&MA
42
23
0
30 Oct 2023
The Eval4NLP 2023 Shared Task on Prompting Large Language Models as Explainable Metrics
Christoph Leiter
Juri Opitz
Daniel Deutsch
Yang Gao
Rotem Dror
Steffen Eger
ALM
LRM
ELM
40
31
0
30 Oct 2023
EHRTutor: Enhancing Patient Understanding of Discharge Instructions
Zihao Zhang
Zonghai Yao
Huixue Zhou
Feiyun Ouyang
Hong-ye Yu
LM&MA
AI4Ed
40
4
0
30 Oct 2023
Are NLP Models Good at Tracing Thoughts: An Overview of Narrative Understanding
Lixing Zhu
Runcong Zhao
Lin Gui
Yulan He
52
4
0
28 Oct 2023
OpinSummEval: Revisiting Automated Evaluation for Opinion Summarization
Yuchen Shen
Xiaojun Wan
38
9
0
27 Oct 2023
Salespeople vs SalesBot: Exploring the Role of Educational Value in Conversational Recommender Systems
Lidiya Murakhovs'ka
Philippe Laban
Tian Xie
Caiming Xiong
Chien-Sheng Wu
33
6
0
26 Oct 2023
Is ChatGPT a Good Multi-Party Conversation Solver?
Chao-Hong Tan
Jia-Chen Gu
Zhen-Hua Ling
27
9
0
25 Oct 2023
CycleAlign: Iterative Distillation from Black-box LLM to White-box Models for Better Human Alignment
Jixiang Hong
Quan Tu
C. Chen
Xing Gao
Ji Zhang
Rui Yan
ALM
34
11
0
25 Oct 2023
Background Summarization of Event Timelines
Adithya Pratapa
Kevin Small
Markus Dreyer
63
2
0
24 Oct 2023
BLESS: Benchmarking Large Language Models on Sentence Simplification
Tannon Kew
Alison Chi
Laura Vásquez-Rodríguez
Sweta Agrawal
Dennis Aumiller
Fernando Alva-Manchego
Teven Le Scao
48
23
0
24 Oct 2023
Previous
1
2
3
...
11
12
13
14
15
16
Next