Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.04048
Cited By
v1
v2
v3 (latest)
Is ChatGPT a Good NLG Evaluator? A Preliminary Study
7 March 2023
Jiaan Wang
Yunlong Liang
Fandong Meng
Zengkui Sun
Haoxiang Shi
Zhixu Li
Jinan Xu
Jianfeng Qu
Jie Zhou
LM&MA
ELM
ALM
AI4MH
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Is ChatGPT a Good NLG Evaluator? A Preliminary Study"
50 / 307 papers shown
Title
Revisiting Compositional Generalization Capability of Large Language Models Considering Instruction Following Ability
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
LRM
29
0
0
18 Jun 2025
Towards Understanding Bias in Synthetic Data for Evaluation
Hossein A. Rahmani
Varsha Ramineni
Nick Craswell
Bhaskar Mitra
Emine Yilmaz
111
0
0
12 Jun 2025
ProRefine: Inference-time Prompt Refinement with Textual Feedback
Deepak Pandita
Tharindu Cyril Weerasooriya
A. Shah
Christopher Homan
Wei Wei
LLMAG
ReLM
LRM
145
0
0
05 Jun 2025
Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons
Isik Baran Sandan
Tu Anh Dinh
Jan Niehues
ELM
96
0
0
04 Jun 2025
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Siyuan Dai
Haoteng Tang
Chenghua Lin
Liang Zhan
25
0
0
04 Jun 2025
I
2
G
I^2G
I
2
G
: Generating Instructional Illustrations via Text-Conditioned Diffusion
Jing Bi
Pinxin Liu
Ali Vosoughi
Jiarui Wu
Jinxi He
Chenliang Xu
DiffM
48
0
0
22 May 2025
R-TOFU: Unlearning in Large Reasoning Models
Sangyeon Yoon
Wonje Jeung
Albert No
MU
LRM
224
1
0
21 May 2025
Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge
Yassir Fathullah
Mark Gales
ELM
79
0
0
21 May 2025
SlangDIT: Benchmarking LLMs in Interpretative Slang Translation
Yunlong Liang
Fandong Meng
Jiaan Wang
Jie Zhou
91
0
0
20 May 2025
SEPS: A Separability Measure for Robust Unlearning in LLMs
Wonje Jeung
Sangyeon Yoon
Albert No
MU
VLM
228
1
0
20 May 2025
YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering
Jennifer D'Souza
Hamed Babaei Giglou
Quentin Münch
ELM
109
0
0
20 May 2025
An Empirical Study of Many-to-Many Summarization with Large Language Models
Jiaan Wang
Fandong Meng
Zengkui Sun
Yunlong Liang
Yuxuan Cao
Jiarong Xu
Haoxiang Shi
Jie Zhou
47
0
0
19 May 2025
What are they talking about? Benchmarking Large Language Models for Knowledge-Grounded Discussion Summarization
Weixiao Zhou
Junnan Zhu
Gengyao Li
Xianfu Cheng
Xinnian Liang
Feifei Zhai
Zhiyu Li
ALM
63
0
0
18 May 2025
How Reliable is Multilingual LLM-as-a-Judge?
Xiyan Fu
Wei Liu
ELM
59
0
0
18 May 2025
Community Search in Time-dependent Road-social Attributed Networks
Li Ni
Hengkai Xu
Lin Mu
Yiwen Zhang
Wenjian Luo
109
0
0
18 May 2025
Counterspeech the ultimate shield! Multi-Conditioned Counterspeech Generation through Attributed Prefix Learning
Aswini Kumar Padhi
Anil Bandhakavi
Tanmoy Chakraborty
219
0
0
17 May 2025
AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation
Xiechi Zhang
Zetian Ouyang
Linlin Wang
Gerard de Melo
Zhu Cao
Xiaoling Wang
Ya Zhang
Yanfeng Wang
Liang He
LM&MA
ELM
124
0
0
17 May 2025
Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models
Banca Calvo Figueras
Rodrigo Agerri
ALM
ELM
LRM
185
2
0
16 May 2025
SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models
Peichao Lai
Kai Zhang
Yi Lin
Lingling Zhang
Feiyang Ye
...
Zifei Shan
Zeang Sheng
Yansen Wang
Wentao Zhang
Bin Cui
ELM
LRM
169
0
0
12 May 2025
PLHF: Prompt Optimization with Few-Shot Human Feedback
Chun-Pai Yang
Kan Zheng
Shou-De Lin
58
0
0
11 May 2025
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts
Hanhua Hong
Chenghao Xiao
Yang Wang
Y. Liu
Wenge Rong
Chenghua Lin
85
0
0
29 Apr 2025
An Empirical Study of Evaluating Long-form Question Answering
Ning Xian
Yixing Fan
Ruqing Zhang
Maarten de Rijke
Jiafeng Guo
ELM
56
0
0
25 Apr 2025
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi R. Fung
135
0
0
23 Apr 2025
Planning with Diffusion Models for Target-Oriented Dialogue Systems
Hanwen Du
Bo Peng
Xia Ning
74
0
0
23 Apr 2025
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
143
9
0
23 Apr 2025
An LLM-as-a-judge Approach for Scalable Gender-Neutral Translation Evaluation
Andrea Piergentili
Beatrice Savoldi
Matteo Negri
L. Bentivogli
ELM
71
1
0
16 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
121
1
0
14 Apr 2025
Recursive Training Loops in LLMs: How training data properties modulate distribution shift in generated data?
Grgur Kovač
Jérémy Perez
Rémy Portelas
Peter Ford Dominey
Pierre-Yves Oudeyer
95
0
0
04 Apr 2025
TN-Eval: Rubric and Evaluation Protocols for Measuring the Quality of Behavioral Therapy Notes
Raj Sanjay Shah
Lei Xu
Qianchu Liu
Jon Burnsky
Drew Bertagnolli
Chaitanya P. Shivade
LM&MA
144
0
0
26 Mar 2025
SLIDE: Sliding Localized Information for Document Extraction
Divyansh Singh
Manuel Nunez Martinez
Bonnie J. Dorr
Sonja Schmer-Galunder
57
0
0
23 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
77
0
0
22 Mar 2025
A Survey on Transformer Context Extension: Approaches and Evaluation
Yijun Liu
Jinzheng Yu
Yang Xu
Zhongyang Li
Qingfu Zhu
LLMAG
128
3
0
17 Mar 2025
Improving LLM-based Document-level Machine Translation with Multi-Knowledge Fusion
Bin Liu
Xinglin Lyu
Junhui Li
Daimeng Wei
Hao Fei
Shimin Tao
Hao Yang
87
0
0
15 Mar 2025
FinTMMBench: Benchmarking Temporal-Aware Multi-Modal RAG in Finance
Fengbin Zhu
Junfeng Li
Liangming Pan
Wenjie Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
AIFin
98
1
0
07 Mar 2025
How Do Hackathons Foster Creativity? Towards AI Collaborative Evaluation of Creativity at Scale
Jeanette Falk
Yiyi Chen
Janet Rafner
Mike Zhang
Johannes Bjerva
Alexander Nolte
127
1
0
06 Mar 2025
In-depth Analysis of Graph-based RAG in a Unified Framework
Yingli Zhou
Yaodong Su
Youran Sun
Shu Wang
Taotao Wang
...
Yongwei Zhang
Sicong Liang
Xilin Liu
Yuchi Ma
Yixiang Fang
102
4
0
06 Mar 2025
Building Safe GenAI Applications: An End-to-End Overview of Red Teaming for Large Language Models
Alberto Purpura
Sahil Wadhwa
Jesse Zymet
Akshay Gupta
Andy Luo
Melissa Kazemi Rad
Swapnil Shinde
Mohammad Sorower
AAML
469
0
0
03 Mar 2025
Towards Efficient Educational Chatbots: Benchmarking RAG Frameworks
Umar Ali Khan
Ekram Khan
Fiza Khan
A. A. Moinuddin
90
0
0
02 Mar 2025
BadJudge: Backdoor Vulnerabilities of LLM-as-a-Judge
Terry Tong
Fei Wang
Zhe Zhao
Mengzhao Chen
AAML
ELM
94
3
0
01 Mar 2025
Towards Enhanced Immersion and Agency for LLM-based Interactive Drama
Hongqiu Wu
Weiqi Wu
Tianyang Xu
Jiameng Zhang
Hai Zhao
AI4CE
112
0
0
25 Feb 2025
Towards an automated workflow in materials science for combining multi-modal simulative and experimental information using data mining and large language models
Balduin Katzer
Steffen Klinder
Katrin Schulz
AI4CE
97
0
0
24 Feb 2025
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Xueran Han
Yuhan Liu
Mingzhe Li
Wen Liu
Sen Hu
Rui Yan
Zhiqiang Xu
Preslav Nakov
104
0
0
24 Feb 2025
Correlating and Predicting Human Evaluations of Language Models from Natural Language Processing Benchmarks
Rylan Schaeffer
Punit Singh Koura
Binh Tang
R. Subramanian
Aaditya K. Singh
...
Vedanuj Goswami
Sergey Edunov
Dieuwke Hupkes
Sanmi Koyejo
Sharan Narang
ALM
146
1
0
24 Feb 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
142
4
0
21 Feb 2025
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge
Ha Trinh
Newman Cheng
Joshua Bradley
Alex Chao
Apurva Mody
Steven Truitt
Dasha Metropolitansky
Robert Osazuwa Ness
Jonathan Larson
RALM
272
447
0
20 Feb 2025
MLaKE: Multilingual Knowledge Editing Benchmark for Large Language Models
Zihao Wei
Jingcheng Deng
Liang Pang
Hanxing Ding
Huawei Shen
Xueqi Cheng
KELM
141
7
0
20 Feb 2025
HPSS: Heuristic Prompting Strategy Search for LLM Evaluators
Bosi Wen
Pei Ke
Yufei Sun
C. Wang
Xiaotao Gu
Jinfeng Zhou
Jie Tang
Hongning Wang
Minlie Huang
34
0
0
18 Feb 2025
Conditioning LLMs to Generate Code-Switched Text
Maite Heredia
Gorka Labaka
Jeremy Barnes
A. Soroa
29
1
0
18 Feb 2025
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning
Junkai Chen
Zhijie Deng
Kening Zheng
Yibo Yan
Shuliang Liu
PeiJun Wu
Peijie Jiang
Qingbin Liu
Xuming Hu
MU
112
8
0
18 Feb 2025
Towards Reasoning Ability of Small Language Models
Gaurav Srivastava
Shuxiang Cao
Xuan Wang
ReLM
LRM
149
11
0
17 Feb 2025
1
2
3
4
5
6
7
Next