ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

7 March 2022

Papers citing "ILDAE: Instance-Level Difficulty Analysis of Evaluation Data"

17 / 17 papers shown

Title
Improving Model Evaluation using SMART Filtering of Benchmark Datasets Vipul Gupta Candace Ross David Pantoja R. Passonneau Megan Ung Adina Williams 100 1 0 26 Oct 2024
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling Jie Ruan Xiao Pu Mingqi Gao Xiaojun Wan Yuesheng Zhu 33 3 0 12 Jun 2024
Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets? Leon Weber-Genzel Robert Litschko Ekaterina Artemova Barbara Plank 24 2 0 04 Sep 2023
Estimating Semantic Similarity between In-Domain and Out-of-Domain Samples Rhitabrat Pokharel Ameeta Agrawal OODD 27 2 0 01 Jun 2023
ActiveAED: A Human in the Loop Improves Annotation Error Detection Leon Weber Barbara Plank 35 10 0 31 May 2023
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA Neeraj Varshney Chitta Baral 39 13 0 02 May 2023
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans? Neeraj Varshney Man Luo Chitta Baral RALM 21 11 0 23 Nov 2022
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing Wenyue Hua Lifeng Jin Linfeng Song Haitao Mi Yongfeng Zhang Dong Yu 32 1 0 08 Nov 2022
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility Himanshu Gupta Neeraj Varshney Swaroop Mishra Kuntal Kumar Pal Saurabh Arjun Sawant Kevin Scaria Siddharth Goyal Chitta Baral ELM 22 14 0 14 Oct 2022
Assessing Out-of-Domain Language Model Performance from Few Examples Prasann Singhal Jarad Forristal Xi Ye Greg Durrett LRM 25 5 0 13 Oct 2022
Voteñ'Rank: Revision of Benchmarking with Social Choice Theory Mark Rofin Vladislav Mikhailov Mikhail Florinskiy A. Kravchenko E. Tutubalina Tatiana Shavrina Daniel Karabekyan Ekaterina Artemova 24 8 0 11 Oct 2022
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems Neeraj Varshney Chitta Baral 30 27 0 11 Oct 2022
What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0 Michael Kamfonas Gabriel Alon 8 0 0 29 Jun 2022
Let the Model Decide its Curriculum for Multitask Learning Neeraj Varshney Swaroop Mishra Chitta Baral 25 8 0 19 May 2022
Towards Improving Selective Prediction Ability of NLP Systems Neeraj Varshney Swaroop Mishra Chitta Baral 8 23 0 21 Aug 2020
An Investigation of Why Overparameterization Exacerbates Spurious Correlations Shiori Sagawa Aditi Raghunathan Pang Wei Koh Percy Liang 152 372 0 09 May 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 299 6,996 0 20 Apr 2018