
Zyda: A 1.3T Dataset for Open Language Modeling
Yury Tokpanov
Beren Millidge
Paolo Glorioso
Jonathan Pilault
Adam Ibrahim
James Whittington
Quentin Anthony
Papers citing "Zyda: A 1.3T Dataset for Open Language Modeling"
12 / 12 papers shown
Title |
---|
![]() Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research Luca Soldaini Rodney Michael Kinney Akshita Bhagia Dustin Schwenk David Atkinson ...Hanna Hajishirzi Iz Beltagy Dirk Groeneveld Jesse Dodge Kyle Lo |