ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.03521
13
19

The Danish Gigaword Project

7 May 2020
Leon Derczynski
Manuel R. Ciosici
R. Baglini
Morten H. Christiansen
Jacob Aarup Dalsgaard
Riccardo Fusaroli
P. Henrichsen
Rasmus Hvingelby
Andreas Søeborg Kirkedal
Alex Speed Kjeldsen
Claus Ladefoged
F. Nielsen
M. Petersen
J. H. Rystrøm
Daniel Varab
ArXivPDFHTML
Abstract

Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.

View on arXiv
Comments on this paper