Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.05601
Cited By
A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models
14 January 2022
Vésteinn Snæbjarnarson
Haukur Barri Símonarson
Pétur Orri Ragnarsson
Svanhvít Lilja Ingólfsdóttir
H. Jónsson
Vilhjálmur Þorsteinsson
H. Einarsson
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models"
7 / 7 papers shown
Title
Aligning Language Models for Icelandic Legal Text Summarization
Þórir Hrafn Harðarson
Hrafn Loftsson
Stefán Ólafsson
AILaw
AI4TS
ELM
82
0
0
25 Apr 2025
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
Nikola Ljubesic
Vít Suchomel
Peter Rupnik
Taja Kuzman
Rik van Noord
CLL
29
5
0
08 Apr 2024
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Rik van Noord
Taja Kuzman
Peter Rupnik
Nikola Ljubesic
Miquel Espla-Gomis
Gema Ramírez-Sánchez
Antonio Toral
ALM
34
2
0
13 Mar 2024
Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora
Svanhvít Lilja Ingólfsdóttir
Pétur Orri Ragnarsson
H. Jónsson
Haukur Barri Símonarson
Vilhjálmur Þorsteinsson
Vésteinn Snæbjarnarson
SyDa
35
9
0
29 May 2023
Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese
Vésteinn Snaebjarnarson
A. Simonsen
Goran Glavavs
Ivan Vulić
37
19
0
18 Apr 2023
Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic
Vésteinn Snæbjarnarson
H. Einarsson
30
5
0
05 Jul 2022
Semi-self-supervised Automated ICD Coding
Hlynur Davíð Hlynsson
S. Ellertsson
J. Daðason
E. Sigurdsson
H. Loftsson
11
2
0
20 May 2022
1