Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.15285
Cited By
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining
18 December 2024
Steven Feng
Shrimai Prabhumoye
John Kamalu
Jane Polak Scowcroft
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining"
4 / 4 papers shown
Title
Curriculum-Guided Layer Scaling for Language Model Pretraining
Karanpartap Singh
Neil Band
Ehsan Adeli
ALM
LRM
41
0
0
13 Jun 2025
Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning
Wanyun Xie
F. Tonin
Volkan Cevher
36
0
0
30 May 2025
ChemPile: A 250GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza
Nawaf Alampara
Martiño Ríos-García
Mohamed Abdelalim
Jack Butler
...
Mark Worrall
Adamo Young
Philippe Schwaller
Michael Pieler
Kevin Maik Jablonka
147
0
0
18 May 2025
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter
Shrimai Prabhumoye
John Kamalu
S. Satheesh
Eric Nyberg
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
LRM
SyDa
ReLM
167
2
0
15 Oct 2024
1