ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.14224
  4. Cited By
mmT5: Modular Multilingual Pre-Training Solves Source Language
  Hallucinations

mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations

23 May 2023
Jonas Pfeiffer
Francesco Piccinno
Massimo Nicosia
Xinyi Wang
Machel Reid
Sebastian Ruder
    VLM
    LRM
ArXivPDFHTML

Papers citing "mmT5: Modular Multilingual Pre-Training Solves Source Language Hallucinations"

50 / 54 papers shown
Title
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models
Ercong Nie
Helmut Schmid
Hinrich Schutze
62
0
0
22 May 2025
QUILL: Quotation Generation Enhancement of Large Language Models
QUILL: Quotation Generation Enhancement of Large Language Models
Jin Xiao
Bowei Zhang
Qianyu He
Jiaqing Liang
Feng Wei
Jinglei Chen
Zujie Liang
Deqing Yang
Yanghua Xiao
HILM
LRM
203
0
0
21 Feb 2025
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Large Language Models are Easily Confused: A Quantitative Metric, Security Implications and Typological Analysis
Yiyi Chen
Qiongxiu Li
Russa Biswas
Johannes Bjerva
90
4
0
17 Oct 2024
Understanding and Mitigating Language Confusion in LLMs
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio
Wei-Yin Ko
Alexandre Berard
Théo Dehaze
Sebastian Ruder
111
30
0
28 Jun 2024
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
Overcoming Catastrophic Forgetting in Zero-Shot Cross-Lingual Generation
Tu Vu
Aditya Barua
Brian Lester
Daniel Cer
Mohit Iyyer
Noah Constant
CLL
55
66
0
25 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular
  Transformers
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
83
143
0
12 May 2022
MASSIVE: A 1M-Example Multilingual Natural Language Understanding
  Dataset with 51 Typologically-Diverse Languages
MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages
Jack G. M. FitzGerald
C. Hench
Charith Peris
Scott Mackie
Kay Rottmann
...
Laurie Crist
Misha Britan
Wouter Leeuwis
Gokhan Tur
Premkumar Natarajan
57
133
0
18 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
471
6,231
0
05 Apr 2022
MAGMA -- Multimodal Augmentation of Generative Models through
  Adapter-based Finetuning
MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning
C. Eichenberg
Sid Black
Samuel Weinbach
Letitia Parcalabescu
Anette Frank
MLLM
VLM
54
100
0
09 Dec 2021
Multilingual Unsupervised Neural Machine Translation with Denoising
  Adapters
Multilingual Unsupervised Neural Machine Translation with Denoising Adapters
Ahmet Üstün
Alexandre Berard
Laurent Besacier
Matthias Gallé
55
45
0
20 Oct 2021
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain
  Information with Adapters
Multilingual Domain Adaptation for NMT: Decoupling Language and Domain Information with Adapters
Asa Cooper Stickland
Alexandre Berard
Vassilina Nikoulina
AI4CE
46
29
0
18 Oct 2021
Tricks for Training Sparse Translation Models
Tricks for Training Sparse Translation Models
Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
M. Lewis
Angela Fan
MoE
174
19
0
15 Oct 2021
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Alan Ansell
Edoardo Ponti
Anna Korhonen
Ivan Vulić
CLL
MoE
121
141
0
14 Oct 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
129
935
0
08 Oct 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient
  Inference
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
245
109
0
24 Sep 2021
Efficient Test Time Adapter Ensembling for Low-resource Language
  Varieties
Efficient Test Time Adapter Ensembling for Low-resource Language Varieties
Xinyi Wang
Yulia Tsvetkov
Sebastian Ruder
Graham Neubig
48
35
0
10 Sep 2021
DEMix Layers: Disentangling Domains for Modular Language Modeling
DEMix Layers: Disentangling Domains for Modular Language Modeling
Suchin Gururangan
Michael Lewis
Ari Holtzman
Noah A. Smith
Luke Zettlemoyer
KELM
MoE
91
134
0
11 Aug 2021
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44
  Languages
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
Tahmid Hasan
Abhik Bhattacharjee
Md. Saiful Islam
Kazi Samin Mubasshir
Yuan-Fang Li
Yong-Bin Kang
M. Rahman
Rifat Shahriyar
79
370
0
25 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
100
485
0
08 Jun 2021
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared
  Hypernetworks
Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Rabeeh Karimi Mahabadi
Sebastian Ruder
Mostafa Dehghani
James Henderson
MoE
66
308
0
08 Jun 2021
Lightweight Adapter Tuning for Multilingual Speech Translation
Lightweight Adapter Tuning for Multilingual Speech Translation
Hang Le
J. Pino
Changhan Wang
Jiatao Gu
D. Schwab
Laurent Besacier
93
90
0
02 Jun 2021
What to Pre-Train on? Efficient Intermediate Task Selection
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
67
100
0
16 Apr 2021
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
XTREME-R: Towards More Challenging and Nuanced Multilingual Evaluation
Sebastian Ruder
Noah Constant
Jan A. Botha
Aditya Siddhant
Orhan Firat
...
Pengfei Liu
Junjie Hu
Dan Garrette
Graham Neubig
Melvin Johnson
ELM
AAML
LRM
68
187
0
15 Apr 2021
Towards Continual Learning for Multilingual Machine Translation via
  Vocabulary Substitution
Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
Xavier Garcia
Noah Constant
Ankur P. Parikh
Orhan Firat
128
46
0
11 Mar 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
85
2,181
0
11 Jan 2021
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual
  Transfer
Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer
M. Vidoni
Ivan Vulić
Goran Glavaš
79
27
0
11 Dec 2020
Language ID in the Wild: Unexpected Challenges on the Path to a
  Thousand-Language Web Text Corpus
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell
Theresa Breiner
D. Esch
Ankur Bapna
68
89
0
27 Oct 2020
mT5: A massively multilingual pre-trained text-to-text transformer
mT5: A massively multilingual pre-trained text-to-text transformer
Linting Xue
Noah Constant
Adam Roberts
Mihir Kale
Rami Al-Rfou
Aditya Siddhant
Aditya Barua
Colin Raffel
132
2,547
0
22 Oct 2020
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on
  a Massive Scale
MultiCQA: Zero-Shot Transfer of Self-Supervised Text Matching Models on a Massive Scale
Andreas Rucklé
Jonas Pfeiffer
Iryna Gurevych
61
37
0
02 Oct 2020
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Parsing with Multilingual BERT, a Small Corpus, and a Small Treebank
Ethan C. Chau
Lucy H. Lin
Noah A. Smith
51
15
0
29 Sep 2020
Reusing a Pretrained Language Model on Languages with Limited Corpora
  for Unsupervised NMT
Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT
Alexandra Chronopoulou
Dario Stojanovski
Alexander Fraser
50
33
0
16 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
749
41,932
0
28 May 2020
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
  Injection into Pretrained Transformers
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher
Olga Majewska
Leonardo F. R. Ribeiro
Iryna Gurevych
Nikolai Rozanov
Goran Glavaš
KELM
54
81
0
24 May 2020
Are All Languages Created Equal in Multilingual BERT?
Are All Languages Created Equal in Multilingual BERT?
Shijie Wu
Mark Dredze
68
323
0
18 May 2020
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning
Edoardo Ponti
Goran Glavaš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
LRM
61
321
0
01 May 2020
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
AdapterFusion: Non-Destructive Task Composition for Transfer Learning
Jonas Pfeiffer
Aishwarya Kamath
Andreas Rucklé
Kyunghyun Cho
Iryna Gurevych
CLL
MoMe
129
849
0
01 May 2020
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
MAD-X: An Adapter-Based Framework for Multi-Task Cross-Lingual Transfer
Jonas Pfeiffer
Ivan Vulić
Iryna Gurevych
Sebastian Ruder
99
626
0
30 Apr 2020
UDapter: Language Adaptation for Truly Universal Dependency Parsing
UDapter: Language Adaptation for Truly Universal Dependency Parsing
Ahmet Üstün
Arianna Bisazza
G. Bouma
Gertjan van Noord
51
115
0
29 Apr 2020
The State and Fate of Linguistic Diversity and Inclusion in the NLP
  World
The State and Fate of Linguistic Diversity and Inclusion in the NLP World
Pratik M. Joshi
Sebastin Santy
A. Budhiraja
Kalika Bali
Monojit Choudhury
LMTD
107
847
0
20 Apr 2020
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating
  Cross-lingual Generalization
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization
Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Firat
Melvin Johnson
ELM
180
973
0
24 Mar 2020
TyDi QA: A Benchmark for Information-Seeking Question Answering in
  Typologically Diverse Languages
TyDi QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages
J. Clark
Eunsol Choi
Michael Collins
Dan Garrette
Tom Kwiatkowski
Vitaly Nikolaev
J. Palomaki
141
609
0
10 Mar 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
89
553
0
05 Feb 2020
Unsupervised Cross-lingual Representation Learning at Scale
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
212
6,555
0
05 Nov 2019
On the Cross-lingual Transferability of Monolingual Representations
On the Cross-lingual Transferability of Monolingual Representations
Mikel Artetxe
Sebastian Ruder
Dani Yogatama
196
797
0
25 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
Simple, Scalable Adaptation for Neural Machine Translation
Simple, Scalable Adaptation for Neural Machine Translation
Ankur Bapna
N. Arivazhagan
Orhan Firat
AI4CE
100
417
0
18 Sep 2019
Slice-based Learning: A Programming Model for Residual Learning in
  Critical Data Slices
Slice-based Learning: A Programming Model for Residual Learning in Critical Data Slices
V. Chen
Sen Wu
Zhenzhen Weng
Alexander Ratner
Christopher Ré
52
56
0
13 Sep 2019
Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
210
4,451
0
02 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
XNLI: Evaluating Cross-lingual Sentence Representations
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
ELM
59
1,381
0
13 Sep 2018
12
Next