Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.05741
Cited By
Efficiently Adapting Pretrained Language Models To New Languages
9 November 2023
Zoltan Csaki
Pian Pawakapan
Urmish Thakker
Qiantong Xu
CLL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficiently Adapting Pretrained Language Models To New Languages"
17 / 17 papers shown
Title
Krikri: Advancing Open Large Language Models for Greek
Dimitris Roussis
Leon Voukoutis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
ALM
25
0
0
19 May 2025
Kuwain 1.5B: An Arabic SLM via Language Injection
Khalil Hennara
Sara Chrouf
Mohamed Motaism Hamed
Zeina Aldallal
Omar Hadid
Safwan AlModhayan
37
1
0
21 Apr 2025
Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance
Ram Mohan Rao Kadiyala
Siddartha Pullakhandam
Siddhant Gupta
Drishti Sharma
Jebish Purbey
Kanwal Mehreen
Muhammad Arham
Hamza Farooq
38
0
0
13 Apr 2025
TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking
Shahriar Kabir Nahin
R. N. Nandi
Sagor Sarker
Quazi Sarwar Muhtaseem
Md. Kowsher
Apu Chandraw Shill
Md Ibrahim
Mehadi Hasan Menon
Tareq Al Muntasir
Firoj Alam
68
0
0
24 Feb 2025
Extending LLMs to New Languages: A Case Study of Llama and Persian Adaptation
Samin Mahdizadeh Sani
Pouya Sadeghi
Thuy-Trang Vu
Yadollah Yaghoobzadeh
Gholamreza Haffari
78
2
0
17 Dec 2024
Language Adaptation on a Tight Academic Compute Budget: Tokenizer Swapping Works and Pure bfloat16 Is Enough
Konstantin Dobler
Gerard de Melo
58
1
0
28 Aug 2024
Bilingual Adaptation of Monolingual Foundation Models
Gurpreet Gosal
Yishi Xu
Gokul Ramakrishnan
Rituraj Joshi
Avraham Sheinin
...
Rahul Pal
Parvez Mullah
Soundar Doraiswamy
Mohamed El Karim Chami
Preslav Nakov
CLL
34
3
0
13 Jul 2024
Building pre-train LLM Dataset for the INDIC Languages: a case study on Hindi
Shantipriya Parida
Shakshi Panwar
Kusum Lata
Sanskruti Mishra
Sambit Sekhar
49
2
0
13 Jul 2024
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities
Shaltiel Shmidman
Avi Shmidman
Amir DN Cohen
Moshe Koppel
43
2
0
09 Jul 2024
Exploring Design Choices for Building Language-Specific LLMs
Atula Tejaswi
Nilesh Gupta
Eunsol Choi
29
10
0
20 Jun 2024
SambaLingo: Teaching Large Language Models New Languages
Zoltan Csaki
Bo Li
Jonathan Li
Qiantong Xu
Pian Pawakapan
Leon Zhang
Yun Du
Hengyu Zhao
Changran Hu
Urmish Thakker
40
6
0
08 Apr 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
36
1
0
04 Mar 2024
Typhoon: Thai Large Language Models
Kunat Pipatanakul
Phatrasek Jirabovonvisut
Potsawee Manakul
Sittipong Sripaisarnmongkol
Ruangsak Patomwong
Pathomporn Chokchainant
Kasima Tharnpipitchai
48
16
0
21 Dec 2023
BERT, mBERT, or BiBERT? A Study on Contextualized Embeddings for Neural Machine Translation
Haoran Xu
Benjamin Van Durme
Kenton W. Murray
50
57
0
09 Sep 2021
Larger-Scale Transformers for Multilingual Masked Language Modeling
Naman Goyal
Jingfei Du
Myle Ott
Giridhar Anantharaman
Alexis Conneau
92
98
0
02 May 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
2,000
0
31 Dec 2020
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
80
235
0
31 Dec 2020
1