Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.12858
Cited By
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
24 October 2020
Benjamin Muller
Antonis Anastasopoulos
Benoît Sagot
Djamé Seddah
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models"
39 / 39 papers shown
Title
HYPEROFA: Expanding LLM Vocabulary to New Languages via Hypernetwork-Based Embedding Initialization
Enes Özeren
Yihong Liu
Hinrich Schütze
31
0
0
21 Apr 2025
NaijaNLP: A Survey of Nigerian Low-Resource Languages
Isa Inuwa-Dutse
44
0
0
27 Feb 2025
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
44
3
0
24 Feb 2025
Cross-Lingual Transfer of Debiasing and Detoxification in Multilingual LLMs: An Extensive Investigation
Vera Neplenbroek
Arianna Bisazza
Raquel Fernández
103
0
0
17 Feb 2025
Prompting with Phonemes: Enhancing LLMs' Multilinguality for Non-Latin Script Languages
Hoang Nguyen
Khyati Mahajan
Vikas Yadav
Philip S. Yu
Masoud Hashemi
Rishabh Maheshwary
Rishabh Maheshwary
47
0
0
04 Nov 2024
SSP: Self-Supervised Prompting for Cross-Lingual Transfer to Low-Resource Languages using Large Language Models
Vipul Rathore
Aniruddha Deb
Ankish Chandresh
Parag Singla
Mausam
LRM
44
0
0
27 Jun 2024
Multilingual Large Language Models and Curse of Multilinguality
Daniil Gurgurov
Tanja Bäumel
Tatiana Anikina
78
4
0
15 Jun 2024
Unknown Script: Impact of Script on Cross-Lingual Transfer
Wondimagegnhue Tufa
Ilia Markov
Piek Vossen
39
0
0
29 Apr 2024
Comparing LLM prompting with Cross-lingual transfer performance on Indigenous and Low-resource Brazilian Languages
David Ifeoluwa Adelani
A. S. Dougruoz
André Coneglian
Atul Kr. Ojha
31
2
0
28 Apr 2024
Modular Adaptation of Multilingual Encoders to Written Swiss German Dialect
Jannis Vamvas
Noëmi Aepli
Rico Sennrich
34
0
0
25 Jan 2024
Multilingual large language models leak human stereotypes across language boundaries
Yang Trista Cao
Anna Sotnikova
Jieyu Zhao
Linda X. Zou
Rachel Rudinger
Hal Daumé
PILM
28
10
0
12 Dec 2023
An Efficient Multilingual Language Model Compression through Vocabulary Trimming
Asahi Ushio
Yi Zhou
Jose Camacho-Collados
41
7
0
24 May 2023
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models
Tarek Naous
Michael Joseph Ryan
Alan Ritter
Wei-ping Xu
31
85
0
23 May 2023
Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models
Orevaoghene Ahia
Sachin Kumar
Hila Gonen
Jungo Kasai
David R. Mortensen
Noah A. Smith
Yulia Tsvetkov
40
80
0
23 May 2023
PrOnto: Language Model Evaluations for 859 Languages
Luke Gessler
21
1
0
22 May 2023
Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages
Sonal Sannigrahi
Rachel Bawden
29
0
0
04 May 2023
GMNLP at SemEval-2023 Task 12: Sentiment Analysis with Phylogeny-Based Adapters
Md Mahfuz Ibn Alam
Ruoyu Xie
Fahim Faisal
Antonios Anastasopoulos
30
3
0
25 Apr 2023
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting
Zheng-Xin Yong
Hailey Schoelkopf
Niklas Muennighoff
Alham Fikri Aji
David Ifeoluwa Adelani
...
Genta Indra Winata
Stella Biderman
Edward Raff
Dragomir R. Radev
Vassilina Nikoulina
CLL
VLM
AI4CE
LRM
29
81
0
19 Dec 2022
Extending the Subwording Model of Multilingual Pretrained Models for New Languages
K. Imamura
Eiichiro Sumita
VLM
27
3
0
29 Nov 2022
Graphemic Normalization of the Perso-Arabic Script
R. Doctor
Alexander Gutkin
Cibu Johny
Brian Roark
R. Sproat
38
4
0
21 Oct 2022
Some Languages are More Equal than Others: Probing Deeper into the Linguistic Disparity in the NLP World
Surangika Ranathunga
Nisansa de Silva
37
34
0
16 Oct 2022
The first neural machine translation system for the Erzya language
David Dale
75
7
0
19 Sep 2022
Language Modelling with Pixels
Phillip Rust
Jonas F. Lotz
Emanuele Bugliarello
Elizabeth Salesky
Miryam de Lhoneux
Desmond Elliott
VLM
35
46
0
14 Jul 2022
Ancestor-to-Creole Transfer is Not a Walk in the Park
Heather Lent
Emanuele Bugliarello
Anders Søgaard
6
8
0
09 Jun 2022
Cross-lingual Lifelong Learning
Meryem M'hamdi
Xiang Ren
Jonathan May
CLL
35
8
0
23 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Victoria Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
40
139
0
12 May 2022
Revisiting the Effects of Leakage on Dependency Parsing
Nathaniel Krasner
Miriam Wanner
Antonios Anastasopoulos
15
0
0
24 Mar 2022
Match the Script, Adapt if Multilingual: Analyzing the Effect of Multilingual Pretraining on Cross-lingual Transferability
Yoshinari Fujinuma
Jordan L. Boyd-Graber
Katharina Kann
AAML
54
23
0
21 Mar 2022
Expanding Pretrained Models to Thousands More Languages via Lexicon-based Adaptation
Xinyi Wang
Sebastian Ruder
Graham Neubig
33
60
0
17 Mar 2022
Oolong: Investigating What Makes Transfer Learning Hard with Controlled Studies
Zhengxuan Wu
Alex Tamkin
Isabel Papadimitriou
21
10
0
24 Feb 2022
NaijaSenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis
Shamsuddeen Hassan Muhammad
David Ifeoluwa Adelani
Sebastian Ruder
I. Ahmad
Idris Abdulmumin
...
Chris C. Emezue
Saheed Abdul
Anuoluwapo Aremu
Alipio Jeorge
P. Brazdil
37
95
0
20 Jan 2022
Dataset Geography: Mapping Language Data to Language Users
Fahim Faisal
Yinkai Wang
Antonios Anastasopoulos
59
23
0
07 Dec 2021
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
Arij Riabi
Benoît Sagot
Djamé Seddah
31
15
0
26 Oct 2021
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
67
12
0
09 Sep 2021
Survey of Low-Resource Machine Translation
Barry Haddow
Rachel Bawden
Antonio Valerio Miceli Barone
Jindvrich Helcl
Alexandra Birch
AIMat
31
147
0
01 Sep 2021
A Primer on Pretrained Multilingual Language Models
Sumanth Doddapaneni
Gowtham Ramesh
Mitesh M. Khapra
Anoop Kunchukuttan
Pratyush Kumar
LRM
43
73
0
01 Jul 2021
Specializing Multilingual Language Models: An Empirical Study
Ethan C. Chau
Noah A. Smith
27
27
0
16 Jun 2021
Towards More Equitable Question Answering Systems: How Much More Data Do You Need?
Arnab Debnath
Navid Rajabi
F. Alam
Antonios Anastasopoulos
22
11
0
28 May 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
204
1,654
0
16 Mar 2020
1