Our Evaluation Metric Needs an Update to Encourage Generalization

14 July 2020

Papers citing "Our Evaluation Metric Needs an Update to Encourage Generalization"

8 / 8 papers shown

Title
LINGO : Visually Debiasing Natural Language Instructions to Support Task Diversity Anjana Arunkumar Shubham Sharma Rakhi Agrawal Sriramakrishnan Chandrasekaran Chris Bryan 34 0 0 12 Apr 2023
Don't Blame the Annotator: Bias Already Starts in the Annotation Instructions Mihir Parmar Swaroop Mishra Mor Geva Chitta Baral 36 55 0 01 May 2022
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks Swaroop Mishra Arindam Mitra Neeraj Varshney Bhavdeep Singh Sachdeva Peter Clark Chitta Baral Ashwin Kalyan AIMat ReLM ELM LRM 30 102 0 12 Apr 2022
Generalized but not Robust? Comparing the Effects of Data Modification Methods on Out-of-Domain Generalization and Adversarial Robustness Tejas Gokhale Swaroop Mishra Man Luo Bhavdeep Singh Sachdeva Chitta Baral 52 29 0 15 Mar 2022
Choose Your QA Model Wisely: A Systematic Study of Generative and Extractive Readers for Question Answering Man Luo Kazuma Hashimoto Semih Yavuz Zhiwei Liu Chitta Baral Yingbo Zhou 29 21 0 14 Mar 2022
Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings Neeraj Varshney Swaroop Mishra Chitta Baral 27 55 0 01 Mar 2022
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation Swaroop Mishra Anjana Arunkumar 34 24 0 10 Jun 2021
Hypothesis Only Baselines in Natural Language Inference Adam Poliak Jason Naradowsky Aparajita Haldar Rachel Rudinger Benjamin Van Durme 190 576 0 02 May 2018