Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.07879
Cited By
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
10 March 2025
Alex Fang
Hadi Pouransari
Matt Jordan
Alexander Toshev
Vaishaal Shankar
Ludwig Schmidt
Tom Gunter
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality"
10 / 10 papers shown
Title
Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic
Sachin Goyal
Pratyush Maini
Zachary Chase Lipton
Aditi Raghunathan
J. Zico Kolter
82
45
0
10 Apr 2024
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.4K
14,313
0
15 Mar 2023
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
180
1,748
0
09 Jun 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
197
1,946
0
29 Mar 2022
Quantifying Memorization Across Neural Language Models
Nicholas Carlini
Daphne Ippolito
Matthew Jagielski
Katherine Lee
Florian Tramèr
Chiyuan Zhang
PILM
118
620
0
15 Feb 2022
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
217
1,517
0
24 May 2019
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge
Alon Talmor
Jonathan Herzig
Nicholas Lourie
Jonathan Berant
RALM
140
1,727
0
02 Nov 2018
CoQA: A Conversational Question Answering Challenge
Siva Reddy
Danqi Chen
Christopher D. Manning
RALM
HAI
98
1,202
0
21 Aug 2018
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
280
8,127
0
16 Jun 2016
1