Data Caricatures: On the Representation of African American Language in Pretraining Corpora
Papers citing "Data Caricatures: On the Representation of African American Language in Pretraining Corpora"
19 / 19 papers shown
Title |
---|
![]() Dolma: an Open Corpus of Three Trillion Tokens for Language Model
Pretraining Research Luca Soldaini Rodney Michael Kinney Akshita Bhagia Dustin Schwenk David Atkinson ...Hanna Hajishirzi Iz Beltagy Dirk Groeneveld Jesse Dodge Kyle Lo |