ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.10795
  4. Cited By
Dataset Cartography: Mapping and Diagnosing Datasets with Training
  Dynamics

Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics

22 September 2020
Swabha Swayamdipta
Roy Schwartz
Nicholas Lourie
Yizhong Wang
Hannaneh Hajishirzi
Noah A. Smith
Yejin Choi
ArXivPDFHTML

Papers citing "Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics"

50 / 115 papers shown
Title
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang
Zhengping Jiang
Anqi Liu
Benjamin Van Durme
61
0
0
02 May 2025
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
42
0
0
01 Apr 2025
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
Model Hubs and Beyond: Analyzing Model Popularity, Performance, and Documentation
Pritam Kadasi
Sriman Reddy
Srivathsa Vamsi Chaturvedula
Rudranshu Sen
Agnish Saha
Soumavo Sikdar
Sayani Sarkar
Suhani Mittal
Rohit Jindal
Mayank Singh
53
0
0
19 Mar 2025
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
Task-Informed Anti-Curriculum by Masking Improves Downstream Performance on Text
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
53
0
0
18 Feb 2025
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis
Wenbo Zhang
Hengrui Cai
Wenyu Chen
82
0
0
17 Feb 2025
Diversity-Oriented Data Augmentation with Large Language Models
Diversity-Oriented Data Augmentation with Large Language Models
Zaitian Wang
Jinghan Zhang
Xinhao Zhang
Kunpeng Liu
Pengfei Wang
Yuanchun Zhou
80
1
0
17 Feb 2025
Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models
Assessing the Impact of the Quality of Textual Data on Feature Representation and Machine Learning Models
Tabinda Sarwar
Antonio Jose Jimeno Yepes
Lawrence Cavedon
69
0
0
12 Feb 2025
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Self-Rationalization in the Wild: A Large Scale Out-of-Distribution Evaluation on NLI-related tasks
Jing Yang
Max Glockner
Anderson de Rezende Rocha
Iryna Gurevych
LRM
73
1
0
07 Feb 2025
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
Evaluating Concurrent Robustness of Language Models Across Diverse Challenge Sets
Vatsal Gupta
Pranshu Pandya
Tushar Kataria
Vivek Gupta
Dan Roth
AAML
57
1
0
03 Jan 2025
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
Michael Yeung
Toya Teramoto
Songtao Wu
Tatsuo Fujiwara
Kenji Suzuki
Tamaki Kojima
83
0
0
09 Dec 2024
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
CorrSynth -- A Correlated Sampling Method for Diverse Dataset Generation from LLMs
Suhas S Kowshik
Abhishek Divekar
Vijit Malik
SyDa
37
0
0
13 Nov 2024
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
76
1
0
26 Oct 2024
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors
Georgios Chochlakis
Alexandros Potamianos
Kristina Lerman
Shrikanth Narayanan
32
0
0
17 Oct 2024
Data Quality Control in Federated Instruction-tuning of Large Language Models
Data Quality Control in Federated Instruction-tuning of Large Language Models
Yaxin Du
Rui Ye
Fengting Yuchi
W. Zhao
Jingjing Qu
Yunhong Wang
Siheng Chen
ALM
FedML
51
0
0
15 Oct 2024
Continual Learning: Less Forgetting, More OOD Generalization via
  Adaptive Contrastive Replay
Continual Learning: Less Forgetting, More OOD Generalization via Adaptive Contrastive Replay
Hossein Rezaei
Mohammad Sabokrou
CLL
26
0
0
09 Oct 2024
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe
Yuxin Xiao
Shujian Zhang
Wenxuan Zhou
Marzyeh Ghassemi
Sanqiang Zhao
109
0
0
07 Oct 2024
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Structural-Entropy-Based Sample Selection for Efficient and Effective Learning
Tianchi Xie
Jiangning Zhu
Guozu Ma
Minzhi Lin
Wei Chen
Weikai Yang
Shixia Liu
28
0
0
03 Oct 2024
Targeted synthetic data generation for tabular data via hardness characterization
Targeted synthetic data generation for tabular data via hardness characterization
Tommaso Ferracci
Leonie Goldmann
Anton Hinel
Francesco Sanna Passino
135
0
0
01 Oct 2024
Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks
Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks
Anita Eisenburger
Daniel Otten
Anselm Hudde
F. Hopfgartner
NoLa
44
1
0
13 Sep 2024
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the
  Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Knowledge-Infused Legal Wisdom: Navigating LLM Consultation through the Lens of Diagnostics and Positive-Unlabeled Reinforcement Learning
Yang Wu
Chenghao Wang
Ece Gumusel
Xiaozhong Liu
ELM
AILaw
48
4
0
05 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small
  Reference Models
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
42
4
0
29 May 2024
Exploring the Evolution of Hidden Activations with Live-Update
  Visualization
Exploring the Evolution of Hidden Activations with Live-Update Visualization
Xianglin Yang
Jin Song Dong
35
0
0
24 May 2024
TrACT: A Training Dynamics Aware Contrastive Learning Framework for
  Long-tail Trajectory Prediction
TrACT: A Training Dynamics Aware Contrastive Learning Framework for Long-tail Trajectory Prediction
Junrui Zhang
Mozhgan Pourkeshavarz
Amir Rasouli
44
3
0
18 Apr 2024
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data
  Annotation
Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation
Juhwan Choi
Jungmin Yun
Kyohoon Jin
Youngbin Kim
32
4
0
15 Apr 2024
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference
Mobashir Sadat
Cornelia Caragea
40
4
0
11 Apr 2024
Towards Principled Task Grouping for Multi-Task Learning
Towards Principled Task Grouping for Multi-Task Learning
Chenguang Wang
Xuanhao Pan
Tianshu Yu
31
0
0
23 Feb 2024
Smaller Language Models are capable of selecting Instruction-Tuning
  Training Data for Larger Language Models
Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models
Dheeraj Mekala
Alex Nguyen
Jingbo Shang
ALM
27
18
0
16 Feb 2024
Importance-Aware Data Augmentation for Document-Level Neural Machine
  Translation
Importance-Aware Data Augmentation for Document-Level Neural Machine Translation
Ming-Ru Wu
Yufei Wang
George F. Foster
Lizhen Qu
Gholamreza Haffari
40
6
0
27 Jan 2024
Annotation Sensitivity: Training Data Collection Methods Affect Model
  Performance
Annotation Sensitivity: Training Data Collection Methods Affect Model Performance
Christoph Kern
Stephanie Eckman
Jacob Beck
Rob Chew
Bolei Ma
Frauke Kreuter
24
9
0
23 Nov 2023
NameGuess: Column Name Expansion for Tabular Data
NameGuess: Column Name Expansion for Tabular Data
Jiani Zhang
Zhengyuan Shen
Balasubramaniam Srinivasan
Shen Wang
Huzefa Rangwala
George Karypis
13
4
0
19 Oct 2023
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in
  Data Pruning
D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning
A. Maharana
Prateek Yadav
Mohit Bansal
27
28
0
11 Oct 2023
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
FTFT: Efficient and Robust Fine-Tuning by Transferring Training Dynamics
Yupei Du
Albert Gatt
Dong Nguyen
31
1
0
10 Oct 2023
Anchor Points: Benchmarking Models with Much Fewer Examples
Anchor Points: Benchmarking Models with Much Fewer Examples
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
29
22
0
14 Sep 2023
Reverse Stable Diffusion: What prompt was used to generate this image?
Reverse Stable Diffusion: What prompt was used to generate this image?
Florinel-Alin Croitoru
Vlad Hondru
Radu Tudor Ionescu
M. Shah
VLM
DiffM
39
6
0
02 Aug 2023
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual
  Interactive Diagnosis through Data-Constrained Counterfactuals
Which Spurious Correlations Impact Reasoning in NLI Models? A Visual Interactive Diagnosis through Data-Constrained Counterfactuals
Robin Shing Moon Chan
Afra Amini
Mennatallah El-Assady
LRM
AAML
32
2
0
21 Jun 2023
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification
  Tasks
NLU on Data Diets: Dynamic Data Subset Selection for NLP Classification Tasks
Jean-Michel Attendu
Jean-Philippe Corbeil
33
15
0
05 Jun 2023
On the Limitations of Simulating Active Learning
On the Limitations of Simulating Active Learning
Katerina Margatina
Nikolaos Aletras
31
11
0
21 May 2023
Can Public Large Language Models Help Private Cross-device Federated
  Learning?
Can Public Large Language Models Help Private Cross-device Federated Learning?
Wei Ping
Yibo Jacky Zhang
Yuan Cao
Bo-wen Li
H. B. McMahan
Sewoong Oh
Zheng Xu
Manzil Zaheer
FedML
29
37
0
20 May 2023
Measuring and Mitigating Local Instability in Deep Neural Networks
Measuring and Mitigating Local Instability in Deep Neural Networks
Arghya Datta
Subhrangshu Nandi
Jingcheng Xu
Greg Ver Steeg
He Xie
Anoop Kumar
Aram Galstyan
20
3
0
18 May 2023
What's the Meaning of Superhuman Performance in Today's NLU?
What's the Meaning of Superhuman Performance in Today's NLU?
Simone Tedeschi
Johan Bos
T. Declerck
Jan Hajic
Daniel Hershcovich
...
Simon Krek
Steven Schockaert
Rico Sennrich
Ekaterina Shutova
Roberto Navigli
ELM
LM&MA
VLM
ReLM
LRM
36
26
0
15 May 2023
Does Informativeness Matter? Active Learning for Educational Dialogue
  Act Classification
Does Informativeness Matter? Active Learning for Educational Dialogue Act Classification
Wei Tan
Jionghao Lin
David Lang
Guanliang Chen
D. Gašević
Lan Du
Wray L. Buntine
11
6
0
12 Apr 2023
Sociocultural knowledge is needed for selection of shots in hate speech
  detection tasks
Sociocultural knowledge is needed for selection of shots in hate speech detection tasks
Antonis Maronikolakis
Abdullatif Köksal
Hinrich Schütze
43
0
0
04 Apr 2023
A Bag-of-Prototypes Representation for Dataset-Level Applications
A Bag-of-Prototypes Representation for Dataset-Level Applications
Wei-Chih Tu
Weijian Deng
Tom Gedeon
Liang Zheng
38
9
0
23 Mar 2023
Simfluence: Modeling the Influence of Individual Training Examples by
  Simulating Training Runs
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs
Kelvin Guu
Albert Webson
Ellie Pavlick
Lucas Dixon
Ian Tenney
Tolga Bolukbasi
TDI
70
33
0
14 Mar 2023
SemEval-2023 Task 10: Explainable Detection of Online Sexism
SemEval-2023 Task 10: Explainable Detection of Online Sexism
Hannah Rose Kirk
Wenjie Yin
Bertie Vidgen
Paul Röttger
18
117
0
07 Mar 2023
Balanced Audiovisual Dataset for Imbalance Analysis
Balanced Audiovisual Dataset for Imbalance Analysis
Wenke Xia
Xu Zhao
Xincheng Pang
Changqing Zhang
Di Hu
37
1
0
14 Feb 2023
Investigating Multi-source Active Learning for Natural Language
  Inference
Investigating Multi-source Active Learning for Natural Language Inference
Ard Snijders
Douwe Kiela
Katerina Margatina
24
7
0
14 Feb 2023
Identifying Semantically Difficult Samples to Improve Text
  Classification
Identifying Semantically Difficult Samples to Improve Text Classification
Shashank Mujumdar
S. Mehta
Hima Patel
Suman Mitra
15
0
0
13 Feb 2023
Unsupervised Deep One-Class Classification with Adaptive Threshold based
  on Training Dynamics
Unsupervised Deep One-Class Classification with Adaptive Threshold based on Training Dynamics
Minkyung Kim
Junsik Kim
Jongmin Yu
Jun Kyun Choi
19
2
0
13 Feb 2023
123
Next