ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.05129
  4. Cited By
Measuring Data

Measuring Data

9 December 2022
Margaret Mitchell
A. Luccioni
Nathan Lambert
Marissa Gerchick
Angelina McMillan-Major
Ezinwanne Ozoani
Nazneen Rajani
Tristan Thrush
Yacine Jernite
Douwe Kiela
ArXivPDFHTML

Papers citing "Measuring Data"

35 / 35 papers shown
Title
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
136
0
0
23 May 2025
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
SEAL : Interactive Tool for Systematic Error Analysis and Labeling
Nazneen Rajani
Weixin Liang
Lingjiao Chen
Margaret Mitchell
James Zou
77
16
0
11 Oct 2022
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
The Vendi Score: A Diversity Evaluation Metric for Machine Learning
Dan Friedman
Adji Bousso Dieng
EGVM
135
124
0
05 Oct 2022
Bugs in the Data: How ImageNet Misrepresents Biodiversity
Bugs in the Data: How ImageNet Misrepresents Biodiversity
A. Luccioni
David Rolnick
72
45
0
24 Aug 2022
BERTIN: Efficient Pre-Training of a Spanish Language Model using
  Perplexity Sampling
BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling
Javier de la Rosa
E. G. Ponferrada
Paulo Villegas
Pablo González de Prado Salas
Manu Romero
María Grandury
65
95
0
14 Jul 2022
Whose Language Counts as High Quality? Measuring Language Ideologies in
  Text Data Selection
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection
Suchin Gururangan
Dallas Card
Sarah K. Drier
E. K. Gade
Leroy Z. Wang
Zeyu Wang
Luke Zettlemoyer
Noah A. Smith
216
80
0
25 Jan 2022
Data Collection and Quality Challenges in Deep Learning: A Data-Centric
  AI Perspective
Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective
Steven Euijong Whang
Yuji Roh
Hwanjun Song
Jae-Gil Lee
68
338
0
13 Dec 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
223
1,428
0
03 Nov 2021
A Data-Centric Approach for Training Deep Neural Networks with Less Data
A Data-Centric Approach for Training Deep Neural Networks with Less Data
Mohammad Motamedi
Nikolay Sakharnykh
T. Kaldewey
73
68
0
07 Oct 2021
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Deep Reinforcement Learning at the Edge of the Statistical Precipice
Rishabh Agarwal
Max Schwarzer
Pablo Samuel Castro
Aaron Courville
Marc G. Bellemare
OffRL
110
671
0
30 Aug 2021
Do Datasets Have Politics? Disciplinary Values in Computer Vision
  Dataset Development
Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development
M. Scheuerman
Emily L. Denton
A. Hanna
72
209
0
09 Aug 2021
Deduplicating Training Data Makes Language Models Better
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
360
628
0
14 Jul 2021
Language Model Evaluation Beyond Perplexity
Language Model Evaluation Beyond Perplexity
Clara Meister
Ryan Cotterell
120
76
0
31 May 2021
SummVis: Interactive Visual Analysis of Models, Data, and Evaluation for
  Text Summarization
SummVis: Interactive Visual Analysis of Models, Data, and Evaluation for Text Summarization
Jesse Vig
Wojciech Kry'sciñski
Karan Goel
Nazneen Rajani
49
22
0
15 Apr 2021
Measuring Model Biases in the Absence of Ground Truth
Measuring Model Biases in the Absence of Ground Truth
Osman Aka
Ken Burke
Alex Bauerle
Christina Greer
Margaret Mitchell
42
34
0
05 Mar 2021
Data and its (dis)contents: A survey of dataset development and use in
  machine learning research
Data and its (dis)contents: A survey of dataset development and use in machine learning research
Amandalynne Paullada
Inioluwa Deborah Raji
Emily M. Bender
Emily L. Denton
A. Hanna
114
524
0
09 Dec 2020
Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics
  for Text Collections
Diversity, Density, and Homogeneity: Quantitative Characteristic Metrics for Text Collections
Yi-An Lai
Xuan Zhu
Yi Zhang
Mona T. Diab
47
22
0
19 Mar 2020
Estimating Training Data Influence by Tracing Gradient Descent
Estimating Training Data Influence by Tracing Gradient Descent
G. Pruthi
Frederick Liu
Mukund Sundararajan
Satyen Kale
TDI
81
408
0
19 Feb 2020
Decision-Making with Auto-Encoding Variational Bayes
Decision-Making with Auto-Encoding Variational Bayes
Romain Lopez
Pierre Boyeau
Nir Yosef
Michael I. Jordan
Jeffrey Regier
BDL
393
10,591
0
17 Feb 2020
Diversity and Inclusion Metrics in Subset Selection
Diversity and Inclusion Metrics in Subset Selection
Margaret Mitchell
Dylan K. Baker
Nyalleng Moorosi
Emily L. Denton
Ben Hutchinson
A. Hanna
Timnit Gebru
Jamie Morgenstern
FaML
172
86
0
09 Feb 2020
Lessons from Archives: Strategies for Collecting Sociocultural Data in
  Machine Learning
Lessons from Archives: Strategies for Collecting Sociocultural Data in Machine Learning
Eun Seo Jo
Timnit Gebru
69
313
0
22 Dec 2019
Measurement and Fairness
Measurement and Fairness
Abigail Z. Jacobs
Hanna M. Wallach
78
388
0
11 Dec 2019
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive
  Strategies
Newsroom: A Dataset of 1.3 Million Summaries with Diverse Extractive Strategies
Max Grusky
Mor Naaman
Yoav Artzi
86
555
0
30 Apr 2018
Datasheets for Datasets
Datasheets for Datasets
Timnit Gebru
Jamie Morgenstern
Briana Vecchione
Jennifer Wortman Vaughan
Hanna M. Wallach
Hal Daumé
Kate Crawford
261
2,181
0
23 Mar 2018
Demystifying MMD GANs
Demystifying MMD GANs
Mikolaj Binkowski
Danica J. Sutherland
Michael Arbel
Arthur Gretton
EGVM
141
1,491
0
04 Jan 2018
Snorkel: Rapid Training Data Creation with Weak Supervision
Snorkel: Rapid Training Data Creation with Weak Supervision
Alexander Ratner
Stephen H. Bach
Henry R. Ehrenberg
Jason Alan Fries
Sen Wu
Christopher Ré
73
1,027
0
28 Nov 2017
Why We Need New Evaluation Metrics for NLG
Why We Need New Evaluation Metrics for NLG
Jekaterina Novikova
Ondrej Dusek
Amanda Cercas Curry
Verena Rieser
82
461
0
21 Jul 2017
On the State of the Art of Evaluation in Neural Language Models
On the State of the Art of Evaluation in Neural Language Models
Gábor Melis
Chris Dyer
Phil Blunsom
65
535
0
18 Jul 2017
Outlier Detection for Text Data : An Extended Version
Outlier Detection for Text Data : An Extended Version
R. Kannan
Hyenkyun Woo
Charu C. Aggarwal
Haesun Park
55
50
0
05 Jan 2017
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word
  Embeddings
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBM
FaML
110
3,135
0
21 Jul 2016
Improved Techniques for Training GANs
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
480
9,048
0
10 Jun 2016
A Diversity-Promoting Objective Function for Neural Conversation Models
A Diversity-Promoting Objective Function for Neural Conversation Models
Jiwei Li
Michel Galley
Chris Brockett
Jianfeng Gao
W. Dolan
143
2,389
0
11 Oct 2015
A Survey of Current Datasets for Vision and Language Research
A Survey of Current Datasets for Vision and Language Research
Francis Ferraro
N. Mostafazadeh
Ting-Hao 'Kenneth' Huang
Huang
Lucy Vanderwende
Jacob Devlin
Michel Galley
Margaret Mitchell
VLM
47
75
0
23 Jun 2015
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
460
43,649
0
17 Sep 2014
Efficient Estimation of Word Representations in Vector Space
Efficient Estimation of Word Representations in Vector Space
Tomas Mikolov
Kai Chen
G. Corrado
J. Dean
3DV
674
31,489
0
16 Jan 2013
1