Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.09435
Cited By
Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation
17 March 2022
Xinyi Wang
Sebastian Ruder
Graham Neubig
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation"
42 / 42 papers shown
Title
Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual Intervention
Weixuan Wang
Minghao Wu
Barry Haddow
Alexandra Birch
LRM
24
2
0
16 Oct 2024
How Transliterations Improve Crosslingual Alignment
Yihong Liu
Mingyang Wang
Amir Hossein Kargaran
Ayyoob Imani
Orgest Xhelili
Haotian Ye
Chunlan Ma
François Yvon
Hinrich Schütze
34
2
0
25 Sep 2024
ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language
Yongkang Liu
Feng Shi
Daling Wang
Yifei Zhang
Hinrich Schütze
15
1
0
16 Aug 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
42
24
0
08 Jul 2024
Exploring Design Choices for Building Language-Specific LLMs
Atula Tejaswi
Nilesh Gupta
Eunsol Choi
27
10
0
20 Jun 2024
Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer
Jianyu Zheng
Fengfei Fan
Jianquan Li
18
2
0
25 Apr 2024
ANGOFA: Leveraging OFA Embedding Initialization and Synthetic Data for Angolan Language Model
Osvaldo Luamba Quinjica
David Ifeoluwa Adelani
27
0
0
03 Apr 2024
NusaBERT: Teaching IndoBERT to be Multilingual and Multicultural
Wilson Wongso
David Samuel Setiawan
Steven Limcorn
Ananto Joyoadikusumo
32
1
0
04 Mar 2024
Transferring BERT Capabilities from High-Resource to Low-Resource Languages Using Vocabulary Matching
Piotr Rybak
27
1
0
22 Feb 2024
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons
Zheng-Xin Yong
Cristina Menghini
Stephen H. Bach
36
3
0
21 Feb 2024
LEIA: Facilitating Cross-lingual Knowledge Transfer in Language Models with Entity-based Data Augmentation
Ikuya Yamada
Ryokan Ri
KELM
20
0
0
18 Feb 2024
Self-Augmented In-Context Learning for Unsupervised Word Translation
Yaoyiran Li
Anna Korhonen
Ivan Vulić
22
5
0
15 Feb 2024
Zero-shot Sentiment Analysis in Low-Resource Languages Using a Multilingual Sentiment Lexicon
Fajri Koto
Tilman Beck
Zeerak Talat
Iryna Gurevych
Timothy Baldwin
52
7
0
03 Feb 2024
A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages
Md Mahfuz Ibn Alam
Sina Ahmadi
Antonios Anastasopoulos
52
0
0
02 Feb 2024
MaLA-500: Massive Language Adaptation of Large Language Models
Peiqin Lin
Shaoxiong Ji
Jörg Tiedemann
André F. T. Martins
Hinrich Schütze
ELM
23
15
0
24 Jan 2024
OFA: A Framework of Initializing Unseen Subword Embeddings for Efficient Large-scale Multilingual Continued Pretraining
Yihong Liu
Peiqin Lin
Mingyang Wang
Hinrich Schütze
24
21
0
15 Nov 2023
On Bilingual Lexicon Induction with Large Language Models
Yaoyiran Li
Anna Korhonen
Ivan Vulić
26
3
0
21 Oct 2023
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer
Mirac Suzgun
Chenguang Xi
Dan Jurafsky
Luke Melas-Kyriazi
24
51
0
28 Sep 2023
SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects
David Ifeoluwa Adelani
Hannah Liu
Xiaoyu Shen
Nikita Vassilyev
Jesujoba Oluwadara Alabi
Yanke Mao
Haonan Gao
Annie En-Shiun Lee
ELM
33
59
0
14 Sep 2023
Translation and Fusion Improves Zero-shot Cross-lingual Information Extraction
Yang Chen
Vedaant Shah
Alan Ritter
21
4
0
23 May 2023
Mitigating Data Imbalance and Representation Degeneration in Multilingual Machine Translation
Wen Lai
Alexandra Chronopoulou
Alexander M. Fraser
32
4
0
22 May 2023
Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Ayyoob Imani
Peiqin Lin
Amir Hossein Kargaran
Silvia Severini
Masoud Jalili Sabet
...
Chunlan Ma
Helmut Schmid
André F. T. Martins
François Yvon
Hinrich Schütze
ALM
LRM
31
95
0
20 May 2023
UniMax: Fairer and more Effective Language Sampling for Large-Scale Multilingual Pretraining
Hyung Won Chung
Noah Constant
Xavier Garcia
Adam Roberts
Yi Tay
Sharan Narang
Orhan Firat
21
49
0
18 Apr 2023
Transfer to a Low-Resource Language via Close Relatives: The Case Study on Faroese
Vésteinn Snaebjarnarson
A. Simonsen
Goran Glavavs
Ivan Vulić
35
19
0
18 Apr 2023
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
Alex Jones
Isaac Caswell
Ishan Saxena
Orhan Firat
21
8
0
27 Mar 2023
Language Embeddings Sometimes Contain Typological Generalizations
Robert Östling
Murathan Kurfali
NAI
24
9
0
19 Jan 2023
Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training
Kelly Marchisio
Patrick Lewis
Yihong Chen
Mikel Artetxe
22
16
0
20 Dec 2022
Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi
Devansh Mehta
Harshita Diddee
Ananya Saxena
Anurag Shukla
Sebastin Santy
...
B. M. L. Srivastava
Alok Sharma
Vishnu Prasad
U. Venkanna
Kalika Bali
19
1
0
29 Nov 2022
GreenPLM: Cross-Lingual Transfer of Monolingual Pre-Trained Language Models at Almost No Cost
Qingcheng Zeng
Lucas Garay
Peilin Zhou
Dading Chong
Yining Hua
Jiageng Wu
Yi-Cheng Pan
Han Zhou
Rob Voigt
Jie Yang
VLM
19
22
0
13 Nov 2022
Improving Bilingual Lexicon Induction with Cross-Encoder Reranking
Yaoyiran Li
Fangyu Liu
Ivan Vulić
Anna Korhonen
34
10
0
30 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
27
5
0
27 Oct 2022
Data-Efficient Strategies for Expanding Hate Speech Detection into Under-Resourced Languages
Paul Röttger
Debora Nozza
Federico Bianchi
Dirk Hovy
26
10
0
20 Oct 2022
SilverAlign: MT-Based Silver Data Algorithm For Evaluating Word Alignment
Abdullatif Köksal
Silvia Severini
Hinrich Schütze
27
0
0
12 Oct 2022
The first neural machine translation system for the Erzya language
David Dale
68
7
0
19 Sep 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
32
46
0
14 Jul 2022
NusaX: Multilingual Parallel Sentiment Dataset for 10 Indonesian Local Languages
Genta Indra Winata
Alham Fikri Aji
Samuel Cahyawijaya
Rahmad Mahendra
Fajri Koto
...
Pascale Fung
Timothy Baldwin
Jey Han Lau
Rico Sennrich
Sebastian Ruder
29
77
0
31 May 2022
Phylogeny-Inspired Adaptation of Multilingual Models to New Languages
Fahim Faisal
Antonios Anastasopoulos
AI4CE
LRM
34
26
0
19 May 2022
Probing Cross-Lingual Lexical Knowledge from Multilingual Sentence Encoders
Ivan Vulić
Goran Glavavs
Fangyu Liu
Nigel Collier
E. Ponti
Anna Korhonen
17
8
0
30 Apr 2022
Systematic Inequalities in Language Technology Performance across the World's Languages
Damián E. Blasi
Antonios Anastasopoulos
Graham Neubig
122
131
0
13 Oct 2021
Named Entity Recognition and Classification on Historical Documents: A Survey
Maud Ehrmann
Ahmed Hamdi
Elvys Linhares Pontes
Matteo Romanello
A. Doucet
57
108
0
23 Sep 2021
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
Benjamin Muller
Antonis Anastasopoulos
Benoît Sagot
Djamé Seddah
LRM
126
165
0
24 Oct 2020
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
201
1,653
0
16 Mar 2020
1