Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.16634
Cited By
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
29 March 2023
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
Re-assign community
ArXiv
PDF
HTML
Papers citing
"G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
50 / 763 papers shown
Title
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu
Kevin Qinghong Lin
Linjie Li
Jianfeng Wang
Yaser Yacoob
Lijuan Wang
VLM
MLLM
40
246
0
26 Jun 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
38
13
0
22 Jun 2023
Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation
Ran Zhang
Jihed Ouni
Steffen Eger
32
6
0
22 Jun 2023
Open-Domain Text Evaluation via Contrastive Distribution Methods
Sidi Lu
Hongyi Liu
Asli Celikyilmaz
Tianlu Wang
Nanyun Peng
31
0
0
20 Jun 2023
Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts
Xuan-Phi Nguyen
Sharifah Mahani Aljunied
Chenyu You
Lidong Bing
23
32
0
20 Jun 2023
BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models
Shaolei Zhang
Qingkai Fang
Zhuocheng Zhang
Zhengrui Ma
Yan Zhou
...
Mengyu Bu
Shangtong Gui
Yunji Chen
Xilin Chen
Yang Feng
ALM
74
40
0
19 Jun 2023
CMMLU: Measuring massive multitask language understanding in Chinese
Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Tim Baldwin
ALM
ELM
47
243
0
15 Jun 2023
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence
John J. Nay
David Karamardian
Sarah Lawsky
Wenting Tao
Meghana Moorthy Bhat
Raghav Jain
Aaron Travis Lee
Jonathan H. Choi
Jungo Kasai
ELM
AILaw
24
57
0
12 Jun 2023
Benchmarking Foundation Models with Language-Model-as-an-Examiner
Yushi Bai
Jiahao Ying
Yixin Cao
Xin Lv
Yuze He
...
Yijia Xiao
Haozhe Lyu
Jiayin Zhang
Juanzi Li
Lei Hou
ALM
ELM
50
136
0
07 Jun 2023
On Optimal Caching and Model Multiplexing for Large Model Inference
Banghua Zhu
Ying Sheng
Lianmin Zheng
Clark W. Barrett
Michael I. Jordan
Jiantao Jiao
33
18
0
03 Jun 2023
Hybrid Long Document Summarization using C2F-FAR and ChatGPT: A Practical Study
Guang Lu
Sylvia B. Larcher
Tu-Anh Tran
29
9
0
01 Jun 2023
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day
Chunyuan Li
Cliff Wong
Sheng Zhang
Naoto Usuyama
Haotian Liu
Jianwei Yang
Tristan Naumann
Hoifung Poon
Jianfeng Gao
LM&MA
MedIm
65
709
0
01 Jun 2023
Deliberate then Generate: Enhanced Prompting Framework for Text Generation
Bei Li
Rui Wang
Junliang Guo
Kaitao Song
Xuejiao Tan
Hany Hassan
Arul Menezes
Tong Xiao
Jiang Bian
JingBo Zhu
24
14
0
31 May 2023
BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages
Wen Yang
Chong Li
Jiajun Zhang
Chengqing Zong
LRM
25
47
0
29 May 2023
AlignScore: Evaluating Factual Consistency with a Unified Alignment Function
Yuheng Zha
Yichi Yang
Ruichen Li
Zhiting Hu
HILM
26
182
0
26 May 2023
SAIL: Search-Augmented Instruction Learning
Hongyin Luo
Yung-Sung Chuang
Yuan Gong
Tianhua Zhang
Yoon Kim
Xixin Wu
D. Fox
Helen Meng
James R. Glass
ALM
LRM
RALM
39
23
0
24 May 2023
The Art of SOCRATIC QUESTIONING: Recursive Thinking with Large Language Models
Jingyuan Qi
Zhiyang Xu
Ying Shen
Minqian Liu
dingnan jin
Qifan Wang
Lifu Huang
ReLM
LRM
KELM
19
11
0
24 May 2023
RefGPT: Dialogue Generation of GPT, by GPT, and for GPT
Dongjie Yang
Ruifeng Yuan
Yuanbin Fan
Yifei Yang
Zili Wang
Shushen Wang
Hai Zhao
HILM
35
8
0
24 May 2023
SummIt: Iterative Text Summarization via ChatGPT
Haopeng Zhang
Xiao Liu
Jiawei Zhang
43
65
0
24 May 2023
David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
Marjan Ghazvininejad
DiffM
34
3
0
24 May 2023
UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning
Ahmed Masry
P. Kavehzadeh
Do Xuan Long
Enamul Hoque
Chenyu You
LRM
27
100
0
24 May 2023
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation
Qi Zeng
Mankeerat Sidhu
Ansel Blume
Hou Pong Chan
Lu Wang
Heng Ji
40
10
0
24 May 2023
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond
Philippe Laban
Wojciech Kry'sciñski
Divyansh Agarwal
Alexander R. Fabbri
Caiming Xiong
Chenyu You
Chien-Sheng Wu
ALM
HILM
35
33
0
23 May 2023
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Sina J. Semnani
Violet Z. Yao
He Zhang
M. Lam
KELM
AI4MH
30
72
0
23 May 2023
INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback
Wenda Xu
Danqing Wang
Liangming Pan
Zhenqiao Song
Markus Freitag
Wei Wang
Lei Li
ALM
ELM
38
18
0
23 May 2023
FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Sewon Min
Kalpesh Krishna
Xinxi Lyu
M. Lewis
Wen-tau Yih
Pang Wei Koh
Mohit Iyyer
Luke Zettlemoyer
Hannaneh Hajishirzi
HILM
ALM
86
611
0
23 May 2023
On Learning to Summarize with Large Language Models as References
Yixin Liu
Kejian Shi
Katherine S He
Longtian Ye
Alexander R. Fabbri
Pengfei Liu
Dragomir R. Radev
Arman Cohan
ELM
43
71
0
23 May 2023
LLM-Eval: Unified Multi-Dimensional Automatic Evaluation for Open-Domain Conversations with Large Language Models
Yen-Ting Lin
Yun-Nung (Vivian) Chen
30
91
0
23 May 2023
Prompt-Based Monte-Carlo Tree Search for Goal-Oriented Dialogue Policy Planning
Xiao Yu
Maximillian Chen
Zhou Yu
LLMAG
LM&Ro
32
35
0
23 May 2023
Evaluating Factual Consistency of Texts with Semantic Role Labeling
Jing Fan
Dennis Aumiller
Michael Gertz
HILM
39
4
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
45
549
0
22 May 2023
SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization Evaluation
Elizabeth Clark
Shruti Rijhwani
Sebastian Gehrmann
Joshua Maynez
Roee Aharoni
Vitaly Nikolaev
Thibault Sellam
Aditya Siddhant
Dipanjan Das
Ankur P. Parikh
32
38
0
22 May 2023
Large Language Models are Not Yet Human-Level Evaluators for Abstractive Summarization
Chenhui Shen
Liying Cheng
Xuan-Phi Nguyen
Yang You
Lidong Bing
ELM
ALM
47
64
0
22 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
360
0
19 May 2023
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models
Zorik Gekhman
Jonathan Herzig
Roee Aharoni
Chen Elkind
Idan Szpektor
HILM
ELM
31
72
0
18 May 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu
Xianjun Yang
Xiujun Li
Xinze Wang
William Yang Wang
EGVM
52
73
0
18 May 2023
PersonaLLM: Investigating the Ability of Large Language Models to Express Personality Traits
Hang Jiang
Xiajie Zhang
Xubo Cao
Cynthia Breazeal
Deb Roy
Jad Kabbara
54
76
0
04 May 2023
A Paradigm Shift: The Future of Machine Translation Lies with Large Language Models
Chenyang Lyu
Zefeng Du
Jitao Xu
Yitao Duan
Minghao Wu
Teresa Lynn
Alham Fikri Aji
Derek F. Wong
Siyou Liu
Longyue Wang
66
25
0
02 May 2023
Beyond Classification: Financial Reasoning in State-of-the-Art Language Models
Guijin Son
Han-Na Jung
Moonjeong Hahm
Keonju Na
Sol Jin
AIFin
LRM
58
18
0
30 Apr 2023
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond
Jingfeng Yang
Hongye Jin
Ruixiang Tang
Xiaotian Han
Qizhang Feng
Haoming Jiang
Bing Yin
Xia Hu
LM&MA
139
629
0
26 Apr 2023
Safety Assessment of Chinese Large Language Models
Hao Sun
Zhexin Zhang
Jiawen Deng
Jiale Cheng
Minlie Huang
ALM
ELM
38
75
0
20 Apr 2023
ChemCrow: Augmenting large-language models with chemistry tools
Andres M Bran
Sam Cox
Oliver Schilter
Carlo Baldassari
Andrew D. White
P. Schwaller
LLMAG
39
363
0
11 Apr 2023
Extractive Summarization via ChatGPT for Faithful Summary Generation
Haopeng Zhang
Xiao Liu
Jiawei Zhang
38
76
0
09 Apr 2023
Human-like Summarization Evaluation with ChatGPT
Mingqi Gao
Jie Ruan
Renliang Sun
Xunjian Yin
Shiping Yang
Xiaojun Wan
ALM
AI4MH
29
125
0
05 Apr 2023
LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models
Patrik Puchert
Poonam Poonam
Christian van Onzenoodt
Timo Ropinski
20
8
0
02 Apr 2023
KPEval: Towards Fine-Grained Semantic-Based Keyphrase Evaluation
Di Wu
Da Yin
Kai-Wei Chang
39
1
0
27 Mar 2023
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
Qingyu Lu
Baopu Qiu
Liang Ding
Liping Xie
Tom Kocmi
Dacheng Tao
LRM
ALM
ELM
31
108
0
24 Mar 2023
DeltaScore: Fine-Grained Story Evaluation with Perturbations
Zhuohan Xie
Miao Li
Trevor Cohn
Jey Han Lau
32
7
0
15 Mar 2023
LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models
Victor C. Dibia
VLM
32
79
0
06 Mar 2023
Zero-Shot Cross-Lingual Summarization via Large Language Models
Jiaan Wang
Yunlong Liang
Fandong Meng
Beiqi Zou
Zhixu Li
Jianfeng Qu
Jie Zhou
ELM
29
28
0
28 Feb 2023
Previous
1
2
3
...
14
15
16
Next