On the Limitations of Reference-Free Evaluations of Generated Text

22 October 2022

Papers citing "On the Limitations of Reference-Free Evaluations of Generated Text"

13 / 13 papers shown

Title
MINERVA: Evaluating Complex Video Reasoning Arsha Nagrani Sachit Menon Ahmet Iscen Shyamal Buch Ramin Mehran ... Yukun Zhu Carl Vondrick Mikhail Sirotenko Cordelia Schmid Tobias Weyand 58 0 0 01 May 2025
What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns Michael A. Hedderich Anyi Wang Raoyuan Zhao Florian Eichin Barbara Plank 30 0 0 22 Apr 2025
HalluCounter: Reference-free LLM Hallucination Detection in the Wild! Ashok Urlana Gopichand Kanumolu Charaka Vinayak Kumar B. Garlapati Rahul Mishra HILM 61 0 0 06 Mar 2025
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Qiyuan Zhang Yufei Wang Tiezheng YU Yuxin Jiang Chuhan Wu ... Xin Jiang Lifeng Shang Ruiming Tang Fuyuan Lyu Chen Ma 31 4 0 07 Oct 2024
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Zhangchen Xu Fengqing Jiang Luyao Niu Yuntian Deng Radha Poovendran Yejin Choi Bill Yuchen Lin SyDa 34 120 0 12 Jun 2024
Exploring the Limits of Fine-grained LLM-based Physics Inference via Premise Removal Interventions Jordan Meadows Tamsin James André Freitas ReLM LRM AI4CE 33 1 0 29 Apr 2024
Interpretation modeling: Social grounding of sentences by reasoning over their implicit moral judgments Liesbeth Allein Maria Mihaela Trucscva Marie-Francine Moens 33 1 0 27 Nov 2023
Walking a Tightrope -- Evaluating Large Language Models in High-Risk Domains Chia-Chien Hung Wiem Ben-Rim Lindsay Frost Lars Bruckner Carolin (Haas) Lawrence AILaw ALM ELM 25 9 0 25 Nov 2023
SEMQA: Semi-Extractive Multi-Source Question Answering Tal Schuster Á. Lelkes Haitian Sun Jai Gupta Jonathan Berant W. Cohen Donald Metzler 33 13 0 08 Nov 2023
CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction Jingheng Ye Hai-Tao Zheng Qingyu Zhou Yongqian Li Shirong Ma Haitao Zheng Ying Shen 30 5 0 18 May 2023
Revisiting the Gold Standard: Grounding Summarization Evaluation with Robust Human Evaluation Yixin Liu Alexander R. Fabbri Pengfei Liu Yilun Zhao Linyong Nan ... Simeng Han Shafiq R. Joty Chien-Sheng Wu Caiming Xiong Dragomir R. Radev ALM 24 132 0 15 Dec 2022
Finding a Balanced Degree of Automation for Summary Evaluation Shiyue Zhang Joey Tianyi Zhou 49 43 0 23 Sep 2021
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents Ramesh Nallapati Feifei Zhai Bowen Zhou 207 1,255 0 14 Nov 2016