ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.17183
  4. Cited By
The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling

30 March 2023
Joey Öhman
S. Verlinden
Ariel Ekgren
Amaru Cuba Gyllensten
T. Isbister
Evangelia Gogoulou
F. Carlsson
Magnus Sahlgren
ArXivPDFHTML

Papers citing "The Nordic Pile: A 1.2TB Nordic Dataset for Language Modeling"

5 / 5 papers shown
Title
Enhancing Portuguese Variety Identification with Cross-Domain Approaches
Enhancing Portuguese Variety Identification with Cross-Domain Approaches
Hugo Sousa
Rúben Almeida
P. Silvano
Inês Cantante
Ricardo Campos
A. Jorge
44
0
0
21 Feb 2025
Continual Learning Under Language Shift
Continual Learning Under Language Shift
Evangelia Gogoulou
Timothée Lesort
Magnus Boman
Joakim Nivre
KELM
CLL
40
4
0
02 Nov 2023
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
GPT-SW3: An Autoregressive Language Model for the Nordic Languages
Ariel Ekgren
Amaru Cuba Gyllensten
Felix Stollenwerk
Joey Öhman
T. Isbister
Evangelia Gogoulou
F. Carlsson
Alice Heiman
Judit Casademont
Magnus Sahlgren
29
13
0
22 May 2023
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
Training and Evaluation of a Multilingual Tokenizer for GPT-SW3
Felix Stollenwerk
31
7
0
28 Apr 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
1