Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.07197
Cited By
Towards a Unified Multi-Dimensional Evaluator for Text Generation
13 October 2022
Ming Zhong
Yang Liu
Da Yin
Yuning Mao
Yizhu Jiao
Peng Liu
Chenguang Zhu
Heng Ji
Jiawei Han
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards a Unified Multi-Dimensional Evaluator for Text Generation"
50 / 59 papers shown
Title
REFINE-AF: A Task-Agnostic Framework to Align Language Models via Self-Generated Instructions using Reinforcement Learning from Automated Feedback
Aniruddha Roy
Pretam Ray
Abhilash Nandy
Somak Aditya
Pawan Goyal
ALM
31
0
0
10 May 2025
SEval-Ex: A Statement-Level Framework for Explainable Summarization Evaluation
Tanguy Herserant
Vincent Guigue
ELM
40
0
0
04 May 2025
JaccDiv: A Metric and Benchmark for Quantifying Diversity of Generated Marketing Text in the Music Industry
Anum Afzal
Alexandre Mercier
Florian Matthes
60
0
0
29 Apr 2025
Can LLMs Be Trusted for Evaluating RAG Systems? A Survey of Methods and Datasets
Lorenz Brehme
Thomas Ströhle
Ruth Breu
59
0
0
28 Apr 2025
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs?
Mohamed Gado
Towhid Taliee
Muhammad Memon
D. Ignatov
Radu Timofte
70
0
0
27 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
S.
Hong-ye Yu
HILM
78
0
0
25 Apr 2025
SpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to Accelerate Your Speech-LLM Development
Minghan Wang
Ye Bai
Y. Wang
Thuy-Trang Vu
Ehsan Shareghi
Gholamreza Haffari
52
0
0
31 Mar 2025
A Scalable Framework for Evaluating Health Language Models
Neil Mallinar
A. Heydari
Xin Liu
Anthony Z. Faranesh
Brent Winslow
...
Mark Malhotra
Shwetak N. Patel
Javier L. Prieto
Daniel J. McDuff
Ahmed A. Metwally
LM&MA
56
2
0
30 Mar 2025
Think Together and Work Better: Combining Humans' and LLMs' Think-Aloud Outcomes for Effective Text Evaluation
SeongYeub Chu
JongWoo Kim
MunYong Yi
57
3
0
21 Feb 2025
MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training
Xinxin You
Xien Liu
Qixin Sun
Huan Zhang
Kaiyin Zhou
Shaohui Liu
Guoping Hu
Shijin Wang
Si Liu
Ji Wu
85
0
0
13 Feb 2025
Learning to Summarize from LLM-generated Feedback
Hwanjun Song
Taewon Yun
Yuho Lee
Jihwan Oh
Gihun Lee
Jason (Jinglun) Cai
Hang Su
73
2
0
28 Jan 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
128
68
0
20 Jan 2025
Foundation Models at Work: Fine-Tuning for Fairness in Algorithmic Hiring
Buse Sibel Korkmaz
Rahul Nair
Elizabeth M. Daly
Evangelos Anagnostopoulos
Christos Varytimidis
Antonio del Rio Chanona
40
0
0
13 Jan 2025
Measuring the Robustness of Reference-Free Dialogue Evaluation Systems
Justin Vasselli
Adam Nohejl
Taro Watanabe
AAML
49
0
0
12 Jan 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
54
96
0
03 Jan 2025
ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation
Chenhui Deng
Yunsheng Bai
Haoxing Ren
31
1
0
31 Dec 2024
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models
Yulei Qin
Yuncheng Yang
Pengcheng Guo
Gang Li
Hang Shao
Yuchen Shi
Zihan Xu
Yun Gu
Ke Li
Xing Sun
ALM
90
12
0
31 Dec 2024
Bridging the Gap between Expert and Language Models: Concept-guided Chess Commentary Generation and Evaluation
Jaechang Kim
Jinmin Goh
Inseok Hwang
Jaewoong Cho
Jungseul Ok
ELM
28
1
0
28 Oct 2024
4-LEGS: 4D Language Embedded Gaussian Splatting
Gal Fiebelman
Tamir Cohen
Ayellet Morgenstern
Peter Hedman
Hadar Averbuch-Elor
3DGS
43
1
0
14 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
37
7
0
03 Oct 2024
How to Make the Most of LLMs' Grammatical Knowledge for Acceptability Judgments
Yusuke Ide
Yuto Nishida
Miyu Oba
Miyu Oba
Justin Vasselli
Hidetaka Kamigaito
Taro Watanabe
39
2
0
19 Aug 2024
Leveraging Entailment Judgements in Cross-Lingual Summarisation
Huajian Zhang
Laura Perez-Beltrachini
HILM
36
0
0
01 Aug 2024
ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues
John Mendonça
Isabel Trancoso
A. Lavie
34
3
0
16 Jul 2024
FineSurE: Fine-grained Summarization Evaluation using LLMs
Hwanjun Song
Hang Su
Igor Shalyminov
Jason (Jinglun) Cai
Saab Mansour
HILM
28
30
0
01 Jul 2024
Factual Dialogue Summarization via Learning from Large Language Models
Rongxin Zhu
Jey Han Lau
Jianzhong Qi
HILM
52
1
0
20 Jun 2024
Holistic Evaluation for Interleaved Text-and-Image Generation
Minqian Liu
Zhiyang Xu
Zihao Lin
Trevor Ashby
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVM
38
7
0
20 Jun 2024
Take the essence and discard the dross: A Rethinking on Data Selection for Fine-Tuning Large Language Models
Ziche Liu
Rui Ke
Feng Jiang
Feng Jiang
Haizhou Li
63
1
0
20 Jun 2024
Satyrn: A Platform for Analytics Augmented Generation
Marko Sterbentz
Cameron Barrie
Shubham Shahi
Abhratanu Dutta
Donna Hooshmand
Harper Pack
Kristian J. Hammond
31
0
0
17 Jun 2024
ReadCtrl: Personalizing text generation with readability-controlled instruction learning
Hieu Tran
Zonghai Yao
Lingxi Li
Hong-ye Yu
52
2
0
13 Jun 2024
Benchmark Data Contamination of Large Language Models: A Survey
Cheng Xu
Shuhao Guan
Derek Greene
Mohand-Tahar Kechadi
ELM
ALM
38
38
0
06 Jun 2024
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation
Kun Zhao
Bohao Yang
Chen Tang
Chenghua Lin
Liang Zhan
41
5
0
24 May 2024
Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons
Adian Liusie
Vatsal Raina
Yassir Fathullah
Mark J. F. Gales
43
9
0
09 May 2024
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
69
46
0
23 Apr 2024
Semi-Supervised Dialogue Abstractive Summarization via High-Quality Pseudolabel Selection
Jianfeng He
Hang Su
Jason (Jinglun) Cai
Igor Shalyminov
Hwanjun Song
Saab Mansour
24
4
0
06 Mar 2024
How Much Annotation is Needed to Compare Summarization Models?
Chantal Shaib
Joe Barrow
Alexa F. Siu
Byron C. Wallace
A. Nenkova
45
2
0
28 Feb 2024
To Burst or Not to Burst: Generating and Quantifying Improbable Text
Kuleen Sasse
Samuel Barham
Efsun Sarioglu Kayi
Edward W. Staley
DeLMO
24
1
0
27 Jan 2024
Investigating Data Contamination for Pre-training Language Models
Minhao Jiang
Ken Ziyu Liu
Ming Zhong
Rylan Schaeffer
Siru Ouyang
Jiawei Han
Sanmi Koyejo
33
63
0
11 Jan 2024
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains
Chia-Chien Hung
Wiem Ben-Rim
Lindsay Frost
Lars Bruckner
Carolin (Haas) Lawrence
AILaw
ALM
ELM
25
9
0
25 Nov 2023
X-Eval: Generalizable Multi-aspect Text Evaluation via Augmented Instruction Tuning with Auxiliary Evaluation Aspects
Minqian Liu
Ying Shen
Zhiyang Xu
Yixin Cao
Eunah Cho
Vaibhav Kumar
Reza Ghanadan
Lifu Huang
ELM
LM&MA
ALM
44
25
0
15 Nov 2023
Chainpoll: A high efficacy method for LLM hallucination detection
Robert Friel
Atindriyo Sanyal
LRM
HILM
18
24
0
22 Oct 2023
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records
Scott L. Fleming
Alejandro Lozano
W. Haberkorn
Jenelle A. Jindal
E. Reis
...
Jonathan H. Chen
Keith Morse
Emma Brunskill
Jason Alan Fries
N. Shah
LM&MA
28
53
0
27 Aug 2023
DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
Pei Ke
Fei Huang
Fei Mi
Yasheng Wang
Qun Liu
Xiaoyan Zhu
Minlie Huang
ReLM
ELM
34
10
0
13 Jul 2023
Information Association for Language Model Updating by Mitigating LM-Logical Discrepancy
Pengfei Yu
Heng Ji
KELM
31
9
0
29 May 2023
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
37
96
0
29 May 2023
UMSE: Unified Multi-scenario Summarization Evaluation
Shen Gao
Zhitao Yao
Chongyang Tao
Xiuying Chen
Pengjie Ren
Z. Ren
Zhumin Chen
30
5
0
26 May 2023
InheritSumm: A General, Versatile and Compact Summarizer by Distilling from GPT
Yichong Xu
Ruochen Xu
Dan Iter
Yang Liu
Shuohang Wang
Chenguang Zhu
Michael Zeng
19
10
0
22 May 2023
Evaluating the Performance of Large Language Models on GAOKAO Benchmark
Xiaotian Zhang
Chun-yan Li
Yi Zong
Zhengyu Ying
Liang He
Xipeng Qiu
ALM
ELM
11
96
0
21 May 2023
Active Retrieval Augmented Generation
Zhengbao Jiang
Frank F. Xu
Luyu Gao
Zhiqing Sun
Qian Liu
Jane Dwivedi-Yu
Yiming Yang
Jamie Callan
Graham Neubig
RALM
9
252
0
11 May 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
36
1,071
0
29 Mar 2023
Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation
Yixin Liu
Alexander R. Fabbri
Yilun Zhao
Pengfei Liu
Shafiq R. Joty
Chien-Sheng Wu
Caiming Xiong
Dragomir R. Radev
13
27
0
07 Mar 2023
1
2
Next