Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.17183
Cited By
The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling
30 March 2023
Joey Öhman
S. Verlinden
Ariel Ekgren
Amaru Cuba Gyllensten
T. Isbister
Evangelia Gogoulou
F. Carlsson
Magnus Sahlgren
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling"
5 / 5 papers shown
Title
Enhancing Portuguese Variety Identification with Cross-Domain Approaches
Hugo Sousa
Rúben Almeida
P. Silvano
Inês Cantante
Ricardo Campos
A. Jorge
44
0
0
21 Feb 2025
Continual Learning Under Language Shift
Evangelia Gogoulou
Timothée Lesort
Magnus Boman
Joakim Nivre
KELM
CLL
40
4
0
02 Nov 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
29
13
0
22 May 2023
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
Felix Stollenwerk
31
7
0
28 Apr 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
1