ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03073
  4. Cited By
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

ILDAE: Instance-Level Difficulty Analysis of Evaluation Data

7 March 2022
Neeraj Varshney
Swaroop Mishra
Chitta Baral
ArXivPDFHTML

Papers citing "ILDAE: Instance-Level Difficulty Analysis of Evaluation Data"

17 / 17 papers shown
Title
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
100
1
0
26 Oct 2024
Better than Random: Reliable NLG Human Evaluation with Constrained
  Active Sampling
Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling
Jie Ruan
Xiao Pu
Mingqi Gao
Xiaojun Wan
Yuesheng Zhu
33
3
0
12 Jun 2024
Donkii: Can Annotation Error Detection Methods Find Errors in
  Instruction-Tuning Datasets?
Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?
Leon Weber-Genzel
Robert Litschko
Ekaterina Artemova
Barbara Plank
24
2
0
04 Sep 2023
Estimating Semantic Similarity between In-Domain and Out-of-Domain
  Samples
Estimating Semantic Similarity between In-Domain and Out-of-Domain Samples
Rhitabrat Pokharel
Ameeta Agrawal
OODD
27
2
0
01 Jun 2023
ActiveAED: A Human in the Loop Improves Annotation Error Detection
ActiveAED: A Human in the Loop Improves Annotation Error Detection
Leon Weber
Barbara Plank
35
10
0
31 May 2023
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances
  in QA
Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA
Neeraj Varshney
Chitta Baral
39
13
0
02 May 2023
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like
  Humans?
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
Discover, Explanation, Improvement: An Automatic Slice Detection
  Framework for Natural Language Processing
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing
Wenyue Hua
Lifeng Jin
Linfeng Song
Haitao Mi
Yongfeng Zhang
Dong Yu
32
1
0
08 Nov 2022
"John is 50 years old, can his son be 65?" Evaluating NLP Models'
  Understanding of Feasibility
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of Feasibility
Himanshu Gupta
Neeraj Varshney
Swaroop Mishra
Kuntal Kumar Pal
Saurabh Arjun Sawant
Kevin Scaria
Siddharth Goyal
Chitta Baral
ELM
22
14
0
14 Oct 2022
Assessing Out-of-Domain Language Model Performance from Few Examples
Assessing Out-of-Domain Language Model Performance from Few Examples
Prasann Singhal
Jarad Forristal
Xi Ye
Greg Durrett
LRM
25
5
0
13 Oct 2022
Voteñ'Rank: Revision of Benchmarking with Social Choice Theory
Voteñ'Rank: Revision of Benchmarking with Social Choice Theory
Mark Rofin
Vladislav Mikhailov
Mikhail Florinskiy
A. Kravchenko
E. Tutubalina
Tatiana Shavrina
Daniel Karabekyan
Ekaterina Artemova
24
8
0
11 Oct 2022
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of
  NLP Systems
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems
Neeraj Varshney
Chitta Baral
30
27
0
11 Oct 2022
What Can Secondary Predictions Tell Us? An Exploration on
  Question-Answering with SQuAD-v2.0
What Can Secondary Predictions Tell Us? An Exploration on Question-Answering with SQuAD-v2.0
Michael Kamfonas
Gabriel Alon
8
0
0
29 Jun 2022
Let the Model Decide its Curriculum for Multitask Learning
Let the Model Decide its Curriculum for Multitask Learning
Neeraj Varshney
Swaroop Mishra
Chitta Baral
25
8
0
19 May 2022
Towards Improving Selective Prediction Ability of NLP Systems
Towards Improving Selective Prediction Ability of NLP Systems
Neeraj Varshney
Swaroop Mishra
Chitta Baral
8
23
0
21 Aug 2020
An Investigation of Why Overparameterization Exacerbates Spurious
  Correlations
An Investigation of Why Overparameterization Exacerbates Spurious Correlations
Shiori Sagawa
Aditi Raghunathan
Pang Wei Koh
Percy Liang
152
372
0
09 May 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,996
0
20 Apr 2018
1