ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.06266
  4. Cited By
Lifting the Curse of Multilinguality by Pre-training Modular
  Transformers

Lifting the Curse of Multilinguality by Pre-training Modular Transformers

12 May 2022
Jonas Pfeiffer
Naman Goyal
Xi Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
    LRM
ArXivPDFHTML

Papers citing "Lifting the Curse of Multilinguality by Pre-training Modular Transformers"

50 / 58 papers shown
Title
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng
Xidong Wang
Juhao Liang
Nuo Chen
Yuping Zheng
Benyou Wang
MoE
107
5
0
14 Oct 2024
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
72
4
0
02 Oct 2024
LangSAMP: Language-Script Aware Multilingual Pretraining
LangSAMP: Language-Script Aware Multilingual Pretraining
Yihong Liu
Haotian Ye
Chunlan Ma
Mingyang Wang
Hinrich Schütze
VLM
156
0
0
26 Sep 2024
Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer
Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer
Mingda Li
Abhijit Mishra
Utkarsh Mujumdar
73
0
0
19 Aug 2024
Multilingual Unsupervised Neural Machine Translation with Denoising
  Adapters
Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
Ahmet Üstün
Alexandre Berard
Laurent Besacier
Matthias Gallé
45
45
0
20 Oct 2021
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain
  Information with Adapters
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters
Asa Cooper Stickland
Alexandre Berard
Vassilina Nikoulina
AI4CE
38
29
0
18 Oct 2021
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Alan Ansell
Edoardo Ponti
Anna Korhonen
Ivan Vulić
CLL
MoE
114
141
0
14 Oct 2021
Efficient Test Time Adapter Ensembling for Low-resource Language
  Varieties
Efficient Test Time Adapter Ensembling for Low-resource Language Varieties
Xinyi Wang
Yulia Tsvetkov
Sebastian Ruder
Graham Neubig
46
35
0
10 Sep 2021
Subword Mapping and Anchoring across Languages
Subword Mapping and Anchoring across Languages
Giorgos Vernikos
Andrei Popescu-Belis
86
12
0
09 Sep 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling
DEMix Layers: Disentangling Domains for Modular Language Modeling
Suchin Gururangan
Michael Lewis
Ari Holtzman
Noah A. Smith
Luke Zettlemoyer
KELM
MoE
89
134
0
11 Aug 2021
Charformer: Fast Character Transformers via Gradient-based Subword
  Tokenization
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
94
158
0
23 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
97
483
0
08 Jun 2021
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
  Hypernetworks
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Rabeeh Karimi Mahabadi
Sebastian Ruder
Mostafa Dehghani
James Henderson
MoE
66
307
0
08 Jun 2021
Lightweight Adapter Tuning for Multilingual Speech Translation
Lightweight Adapter Tuning for Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
91
90
0
02 Jun 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
83
502
0
28 May 2021
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of
  Pretrained Multilingual Models in Truly Low-resource Languages
AmericasNLI: Evaluating Zero-shot Natural Language Understanding of Pretrained Multilingual Models in Truly Low-resource Languages
Abteen Ebrahimi
Manuel Mager
Arturo Oncevay
Vishrav Chaudhary
Luis Chiruzzo
...
Graham Neubig
Alexis Palmer
Rolando A. Coto Solano
Ngoc Thang Vu
Katharina Kann
140
74
0
18 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
59
99
0
16 Apr 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
85
218
0
11 Mar 2021
Towards Continual Learning for Multilingual Machine Translation via
  Vocabulary Substitution
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
Xavier Garcia
Noah Constant
Ankur P. Parikh
Orhan Firat
119
45
0
11 Mar 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
83
2,168
0
11 Jan 2021
How Good is Your Tokenizer? On the Monolingual Performance of
  Multilingual Language Models
How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models
Phillip Rust
Jonas Pfeiffer
Ivan Vulić
Sebastian Ruder
Iryna Gurevych
117
250
0
31 Dec 2020
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts
Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder
60
131
0
31 Dec 2020
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual
  Transfer
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer
M. Vidoni
Ivan Vulić
Goran Glavaš
71
27
0
11 Dec 2020
When Being Unseen from mBERT is just the Beginning: Handling New
  Languages With Multilingual Language Models
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
Benjamin Muller
Antonis Anastasopoulos
Benoît Sagot
Djamé Seddah
LRM
160
168
0
24 Oct 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
144
142
0
24 Oct 2020
Improving Multilingual Models with Language-Clustered Vocabularies
Improving Multilingual Models with Language-Clustered Vocabularies
Hyung Won Chung
Dan Garrette
Kiat Chuan Tan
Jason Riesa
VLM
100
65
0
24 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
112
261
0
22 Oct 2020
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on
  a Massive Scale
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale
Andreas Rucklé
Jonas Pfeiffer
Iryna Gurevych
59
37
0
02 Oct 2020
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Ethan C. Chau
Lucy H. Lin
Noah A. Smith
51
15
0
29 Sep 2020
Reusing a Pretrained Language Model on Languages with Limited Corpora
  for Unsupervised NMT
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT
Alexandra Chronopoulou
Dario Stojanovski
Alexander Fraser
48
33
0
16 Sep 2020
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
  Injection into Pretrained Transformers
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher
Olga Majewska
Leonardo F. R. Ribeiro
Iryna Gurevych
Nikolai Rozanov
Goran Glavaš
KELM
54
81
0
24 May 2020
Are All Languages Created Equal in Multilingual BERT?
Are All Languages Created Equal in Multilingual BERT?
Shijie Wu
Mark Dredze
63
322
0
18 May 2020
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
61
320
0
01 May 2020
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
Jonas Pfeiffer
Aishwarya Kamath
Andreas Rucklé
Kyunghyun Cho
Iryna Gurevych
CLL
MoMe
127
845
0
01 May 2020
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder
96
625
0
30 Apr 2020
UDapter: Language Adaptation for Truly Universal Dependency Parsing
UDapter: Language Adaptation for Truly Universal Dependency Parsing
Ahmet Üstün
Arianna Bisazza
G. Bouma
Gertjan van Noord
51
115
0
29 Apr 2020
Extending Multilingual BERT to Low-Resource Languages
Extending Multilingual BERT to Low-Resource Languages
Zihan Wang
Karthikeyan K
Stephen D. Mayhew
Dan Roth
VLM
59
129
0
28 Apr 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
161
970
0
24 Mar 2020
From English To Foreign Languages: Transferring Pre-trained Language
  Models
From English To Foreign Languages: Transferring Pre-trained Language Models
Ke M. Tran
42
51
0
18 Feb 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
87
553
0
05 Feb 2020
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Cross-Lingual Ability of Multilingual BERT: An Empirical Study
Karthikeyan K
Zihan Wang
Stephen D. Mayhew
Dan Roth
LRM
62
338
0
17 Dec 2019
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
193
6,538
0
05 Nov 2019
On the Cross-lingual Transferability of Monolingual Representations
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
169
793
0
25 Oct 2019
Simple, Scalable Adaptation for Neural Machine Translation
Simple, Scalable Adaptation for Neural Machine Translation
Ankur Bapna
N. Arivazhagan
Orhan Firat
AI4CE
95
416
0
18 Sep 2019
Slice-based Learning: A Programming Model for Residual Learning in
  Critical Data Slices
Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
V. Chen
Sen Wu
Zhenzhen Weng
Alexander Ratner
Christopher Ré
50
56
0
13 Sep 2019
How multilingual is Multilingual BERT?
How multilingual is Multilingual BERT?
Telmo Pires
Eva Schlinger
Dan Garrette
LRM
VLM
143
1,401
0
04 Jun 2019
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Shijie Wu
Mark Dredze
VLM
SSeg
91
677
0
19 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
95
3,147
0
01 Apr 2019
Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
208
4,439
0
02 Feb 2019
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong
  Baselines, Comparative Analyses, and Some Misconceptions
How to (Properly) Evaluate Cross-Lingual Word Embeddings: On Strong Baselines, Comparative Analyses, and Some Misconceptions
Goran Glavaš
Robert Litschko
Sebastian Ruder
Ivan Vulić
ELM
62
183
0
01 Feb 2019
12
Next