Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 763 papers shown
Title
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Swarnadeep Saha
Omer Levy
Asli Celikyilmaz
Mohit Bansal
Jason Weston
Xian Li
MoMe
43
71
0
23 Oct 2023
QUDEVAL: The Evaluation of Questions Under Discussion Discourse Parsing
Yating Wu
Ritika Mangla
Greg Durrett
Junyi Jessy Li
47
12
0
23 Oct 2023
Chainpoll: A high efficacy method for LLM hallucination detection
Robert Friel
Atindriyo Sanyal
LRM
HILM
34
26
0
22 Oct 2023
Revisiting Instruction Fine-tuned Model Evaluation to Guide Industrial Applications
Manuel Faysse
Gautier Viaud
C´eline Hudelot
Pierre Colombo
32
9
0
21 Oct 2023
Enhancing Abstractiveness of Summarization Models through Calibrated Distillation
Hwanjun Song
Igor Shalyminov
Hang Su
Siffi Singh
Kaisheng Yao
Saab Mansour
30
6
0
20 Oct 2023
Tuna: Instruction Tuning using Feedback from Large Language Models
Haoran Li
Yiran Liu
Xingxing Zhang
Wei Lu
Furu Wei
ALM
38
3
0
20 Oct 2023
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai
Zeqiu Wu
Yizhong Wang
Avirup Sil
Hannaneh Hajishirzi
RALM
176
647
0
17 Oct 2023
Medical Text Simplification: Optimizing for Readability with Unlikelihood Training and Reranked Beam Search Decoding
Lorenzo Jaime Yu Flores
Heyuan Huang
Kejian Shi
Sophie Chheang
Arman Cohan
MedIm
32
6
0
17 Oct 2023
Exploring Automatic Evaluation Methods based on a Decoder-based LLM for Text Generation
Tomohito Kasahara
Daisuke Kawahara
33
2
0
17 Oct 2023
Compositional preference models for aligning LMs
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Marc Dymetman
29
16
0
17 Oct 2023
Improving Large Language Model Fine-tuning for Solving Math Problems
Yixin Liu
Avi Singh
C. D. Freeman
John D. Co-Reyes
Peter J. Liu
LRM
ReLM
43
45
0
16 Oct 2023
FiLM: Fill-in Language Models for Any-Order Generation
Tianxiao Shen
Hao-Chun Peng
Ruoqi Shen
Yao Fu
Zaïd Harchaoui
Yejin Choi
41
8
0
15 Oct 2023
How Good is ChatGPT in Giving Advice on Your Visualization Design?
Nam Wook Kim
Grace Myers
Benjamin Bach
28
20
0
14 Oct 2023
VizAbility: Enhancing Chart Accessibility with LLM-based Conversational Interaction
Joshua Gorniak
Yoon Kim
Donglai Wei
Nam Wook Kim
32
8
0
14 Oct 2023
xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark
Chen Zhang
L. F. D’Haro
Chengguang Tang
Ke Shi
Guohua Tang
Haizhou Li
ELM
46
9
0
13 Oct 2023
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Seungone Kim
Jamin Shin
Yejin Cho
Joel Jang
Shayne Longpre
...
Sangdoo Yun
Seongjin Shin
Sungdong Kim
James Thorne
Minjoon Seo
ALM
LM&MA
ELM
37
214
0
12 Oct 2023
Prompting Large Language Models with Chain-of-Thought for Few-Shot Knowledge Base Question Generation
Yuanyuan Liang
Jianing Wang
Hanlun Zhu
Lei Wang
Weining Qian
Yunshi Lan
LRM
ReLM
24
37
0
12 Oct 2023
Towards Better Evaluation of Instruction-Following: A Case-Study in Summarization
Ondrej Skopek
Rahul Aralikatte
Sian Gooding
Victor Carbune
ELM
47
18
0
12 Oct 2023
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation
Wang You
Wenshan Wu
Yaobo Liang
Shaoguang Mao
Chenfei Wu
...
Yuzhe Cai
Yiduo Guo
Yan Xia
Furu Wei
Nan Duan
29
8
0
12 Oct 2023
Fine-grained Conversational Decoding via Isotropic and Proximal Search
Yuxuan Yao
Han Wu
Qiling Xu
Linqi Song
25
1
0
12 Oct 2023
OpenLEAF: Open-Domain Interleaved Image-Text Generation and Evaluation
Jie An
Zhengyuan Yang
Linjie Li
Jianfeng Wang
K. Lin
Zicheng Liu
Lijuan Wang
Jiebo Luo
25
11
0
11 Oct 2023
Evaluating Large Language Models at Evaluating Instruction Following
Zhiyuan Zeng
Jiatong Yu
Tianyu Gao
Yu Meng
Tanya Goyal
Danqi Chen
ELM
ALM
46
170
0
11 Oct 2023
Risk Aware Benchmarking of Large Language Models
Apoorva Nitsure
Youssef Mroueh
Mattia Rigotti
Kristjan Greenewald
Brian M. Belgodere
Mikhail Yurochkin
Jirí Navrátil
Igor Melnyk
Jerret Ross
30
1
0
11 Oct 2023
Understanding the Effects of RLHF on LLM Generalisation and Diversity
Robert Kirk
Ishita Mediratta
Christoforos Nalmpantis
Jelena Luketina
Eric Hambro
Edward Grefenstette
Roberta Raileanu
AI4CE
ALM
115
125
0
10 Oct 2023
LLM for SoC Security: A Paradigm Shift
Dipayan Saha
Shams Tarek
Katayoon Yahyaei
S. Saha
Jingbo Zhou
M. Tehranipoor
Farimah Farahmandi
69
48
0
09 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
43
13
0
09 Oct 2023
Towards Verifiable Generation: A Benchmark for Knowledge-aware Language Model Attribution
Xinze Li
Yixin Cao2
Liangming Pan
Yubo Ma
Aixin Sun
HILM
24
21
0
09 Oct 2023
Generative Judge for Evaluating Alignment
Junlong Li
Shichao Sun
Weizhe Yuan
Run-Ze Fan
Hai Zhao
Pengfei Liu
ELM
ALM
35
80
0
09 Oct 2023
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
32
55
0
09 Oct 2023
Factuality Challenges in the Era of Large Language Models
Isabelle Augenstein
Timothy Baldwin
Meeyoung Cha
Tanmoy Chakraborty
Giovanni Luca Ciampaglia
...
Rubén Míguez
Preslav Nakov
Dietram A. Scheufele
Shivam Sharma
Giovanni Zagni
HILM
42
41
0
08 Oct 2023
Critique Ability of Large Language Models
Liangchen Luo
Zi Lin
Yinxiao Liu
Lei Shu
Yun Zhu
Jingbo Shang
Lei Meng
AI4MH
LRM
ELM
24
14
0
07 Oct 2023
Amortizing intractable inference in large language models
Marvin Schmitt
Moksh Jain
Daniel Habermann
Younesse Kaddar
Ullrich Kothe
Stefan T. Radev
Nikolay Malkin
AIFin
BDL
32
49
0
06 Oct 2023
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations
Deren Lei
Yaxi Li
Mengya Hu
Mingyu Wang
Vincent Yun
Emily Ching
Eslam Kamal
HILM
LRM
24
40
0
06 Oct 2023
Automatic and Human-AI Interactive Text Generation
Yao Dou
Philippe Laban
Claire Gardent
Wei-ping Xu
34
4
0
05 Oct 2023
Fine-tune Language Models to Approximate Unbiased In-context Learning
Timothy Chu
Zhao Song
Chiwun Yang
29
15
0
05 Oct 2023
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Manas
Benno Krojer
Aishwarya Agrawal
32
21
0
04 Oct 2023
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
Long Chen
Oleg Sinavski
Jan Hünermann
Alice Karnsund
Andrew James Willmott
Danny Birch
Daniel Maund
Jamie Shotton
MLLM
20
186
0
03 Oct 2023
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
27
0
0
02 Oct 2023
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang
Kyle Lo
Tanya Goyal
Mohit Iyyer
ALM
26
108
0
01 Oct 2023
Graph Neural Architecture Search with GPT-4
Haishuai Wang
Yang Gao
Xin-Min Zheng
Peng Zhang
Hongyang Chen
Jiajun Bu
Philip S. Yu
AI4CE
36
28
0
30 Sep 2023
SocREval: Large Language Models with the Socratic Method for Reference-Free Reasoning Evaluation
Hangfeng He
Hongming Zhang
Dan Roth
LRM
ELM
ReLM
30
14
0
29 Sep 2023
Benchmarking Cognitive Biases in Large Language Models as Evaluators
Ryan Koo
Minhwa Lee
Vipul Raheja
Jong Inn Park
Zae Myung Kim
Dongyeop Kang
ALM
43
76
0
29 Sep 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
24
177
0
26 Sep 2023
Question-Answering Approach to Evaluating Legal Summaries
Huihui Xu
Kevin D. Ashley
AILaw
ELM
29
3
0
26 Sep 2023
Art or Artifice? Large Language Models and the False Promise of Creativity
Tuhin Chakrabarty
Philippe Laban
Divyansh Agarwal
Smaranda Muresan
Chien-Sheng Wu
32
118
0
25 Sep 2023
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning
Hosein Hasanbeig
Hiteshi Sharma
Leo Betthauser
Felipe Vieira Frujeri
Ida Momennejad
40
15
0
24 Sep 2023
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria
Tae Soo Kim
Yoonjoo Lee
Jamin Shin
Young-Ho Kim
Juho Kim
34
69
0
24 Sep 2023
Calibrating LLM-Based Evaluator
Yuxuan Liu
Tianchi Yang
Shaohan Huang
Zihan Zhang
Haizhen Huang
Furu Wei
Weiwei Deng
Feng Sun
Qi Zhang
49
31
0
23 Sep 2023
Construction contract risk identification based on knowledge-augmented language model
Saika Wong
Chunmo Zheng
Xing Su
Yinqiu Tang
20
14
0
22 Sep 2023
Automatic Answerability Evaluation for Question Generation
Zifan Wang
Kotaro Funakoshi
Manabu Okumura
34
2
0
22 Sep 2023
Previous
1
2
3
...
12
13
14
15
16
Next