ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.05601
  4. Cited By
A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language
  Models

A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models

14 January 2022
Vésteinn Snæbjarnarson
Haukur Barri Símonarson
Pétur Orri Ragnarsson
Svanhvít Lilja Ingólfsdóttir
H. Jónsson
Vilhjálmur Þorsteinsson
H. Einarsson
ArXivPDFHTML

Papers citing "A Warm Start and a Clean Crawled Corpus -- A Recipe for Good Language Models"

7 / 7 papers shown
Title
Aligning Language Models for Icelandic Legal Text Summarization
Aligning Language Models for Icelandic Legal Text Summarization
Þórir Hrafn Harðarson
Hrafn Loftsson
Stefán Ólafsson
AILaw
AI4TS
ELM
82
0
0
25 Apr 2025
Language Models on a Diet: Cost-Efficient Development of Encoders for
  Closely-Related Languages via Additional Pretraining
Language Models on a Diet: Cost-Efficient Development of Encoders for Closely-Related Languages via Additional Pretraining
Nikola Ljubesic
Vít Suchomel
Peter Rupnik
Taja Kuzman
Rik van Noord
CLL
29
5
0
08 Apr 2024
Do Language Models Care About Text Quality? Evaluating Web-Crawled
  Corpora Across 11 Languages
Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages
Rik van Noord
Taja Kuzman
Peter Rupnik
Nikola Ljubesic
Miquel Espla-Gomis
Gema Ramírez-Sánchez
Antonio Toral
ALM
34
2
0
13 Mar 2024
Byte-Level Grammatical Error Correction Using Synthetic and Curated
  Corpora
Byte-Level Grammatical Error Correction Using Synthetic and Curated Corpora
Svanhvít Lilja Ingólfsdóttir
Pétur Orri Ragnarsson
H. Jónsson
Haukur Barri Símonarson
Vilhjálmur Þorsteinsson
Vésteinn Snæbjarnarson
SyDa
35
9
0
29 May 2023
Transfer to a Low-Resource Language via Close Relatives: The Case Study
  on Faroese
Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese
Vésteinn Snaebjarnarson
A. Simonsen
Goran Glavavs
Ivan Vulić
37
19
0
18 Apr 2023
Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in
  Icelandic
Cross-Lingual QA as a Stepping Stone for Monolingual Open QA in Icelandic
Vésteinn Snæbjarnarson
H. Einarsson
30
5
0
05 Jul 2022
Semi-self-supervised Automated ICD Coding
Semi-self-supervised Automated ICD Coding
Hlynur Davíð Hlynsson
S. Ellertsson
J. Daðason
E. Sigurdsson
H. Loftsson
11
2
0
20 May 2022
1