Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy

15 September 2024

Papers citing "Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy"

6 / 6 papers shown

Title
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models Tobias Domhan Dawei Zhu 33 0 0 03 May 2025
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling Shaomu Tan Christof Monz 42 0 0 18 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models José P. Pombal Nuno M. Guerreiro Ricardo Rei André F. T. Martins ALM 75 0 0 01 Apr 2025
Adding Chocolate to Mint: Mitigating Metric Interference in Machine Translation José P. Pombal Nuno M. Guerreiro Ricardo Rei André F. T. Martins 61 0 0 11 Mar 2025
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration David Anugraha Garry Kuwanto Lucky Susanto Derry Wijaya Genta Indra Winata OSLM 40 2 0 01 Nov 2024
Adapters for Altering LLM Vocabularies: What Languages Benefit the Most? HyoJung Han Akiko Eriguchi Haoran Xu Hieu T. Hoang Marine Carpuat Huda Khayrallah VLM 37 2 0 12 Oct 2024