Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.01335
Cited By
v1
v2 (latest)
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
2 October 2024
Lucas Bandarkar
Benjamin Muller
Pritish Yuvraj
Rui Hou
Nayan Singhal
Hongjiang Lv
Bing-Quan Liu
KELM
LRM
MoMe
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models"
50 / 54 papers shown
Title
Scaling Test-time Compute for Low-resource Languages: Multilingual Reasoning in LLMs
Khanh-Tung Tran
Barry O'Sullivan
Hoang D. Nguyen
LRM
80
2
0
02 Apr 2025
Efficient Model Development through Fine-tuning Transfer
Pin-Jie Lin
Rishab Balasubramanian
Fengyuan Liu
Nikhil Kandpal
Tu Vu
170
2
0
25 Mar 2025
InkubaLM: A small language model for low-resource African languages
A. Tonja
Bonaventure F. P. Dossou
Jessica Ojo
Jenalea Rajab
Fadel Thior
...
Anuoluwapo Aremu
Pelonomi Moiloa
Jade Z. Abbott
Vukosi Marivate
Benjamin Rosman
80
11
0
30 Aug 2024
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
John Dang
Arash Ahmadian
Kelly Marchisio
Julia Kreutzer
Ahmet Üstün
Sara Hooker
95
27
0
02 Jul 2024
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback
Wen Lai
Mohsen Mesgar
Alexander Fraser
LRM
ALM
110
25
0
03 Jun 2024
LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding
Mostafa Elhoushi
Akshat Shrivastava
Diana Liskovich
Basil Hosmer
Bram Wasti
...
Saurabh Agarwal
Ahmed Roman
Ahmed Aly
Beidi Chen
Carole-Jean Wu
LRM
93
109
0
25 Apr 2024
On the Multilingual Ability of Decoder-based Pre-trained Language Models: Finding and Controlling Language-Specific Neurons
Takeshi Kojima
Itsuki Okimura
Yusuke Iwasawa
Hitomi Yanaka
Yutaka Matsuo
MILM
LRM
64
39
0
03 Apr 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
87
60
0
26 Feb 2024
Unveiling Linguistic Regions in Large Language Models
Zhihao Zhang
Jun Zhao
Qi Zhang
Tao Gui
Xuanjing Huang
80
14
0
22 Feb 2024
The Hidden Space of Transformer Language Adapters
Jesujoba Oluwadara Alabi
Marius Mosbach
Matan Eyal
Dietrich Klakow
Mor Geva
75
10
1
20 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
112
130
0
16 Feb 2024
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning
Shivalika Singh
Freddie Vargus
Daniel D'souza
Börje F. Karlsson
Abinaya Mahendiran
...
Max Bartolo
Julia Kreutzer
Ahmet Üstün
Marzieh Fadaee
Sara Hooker
194
125
0
09 Feb 2024
Scaling Sparse Fine-Tuning to Large Language Models
Alan Ansell
Ivan Vulić
Hannah Sterz
Anna Korhonen
Edoardo Ponti
45
17
0
29 Jan 2024
xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning
Linzheng Chai
Jian Yang
Tao Sun
Hongcheng Guo
Jiaheng Liu
...
Xiannian Liang
Jiaqi Bai
Tongliang Li
Qiyao Peng
Zhoujun Li
LRM
76
54
0
13 Jan 2024
MAPO: Advancing Multilingual Reasoning through Multilingual Alignment-as-Preference Optimization
Shuaijie She
Wei Zou
Shujian Huang
Wenhao Zhu
Xiang Liu
Xiang Geng
Jiajun Chen
LRM
103
41
0
12 Jan 2024
PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning
Zhihan Zhang
Dong-Ho Lee
Yuwei Fang
Wenhao Yu
Mengzhao Jia
Meng Jiang
Francesco Barbieri
ALM
87
30
0
15 Nov 2023
Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks
Rochelle Choenni
Ekaterina Shutova
Daniel H Garrette
63
8
0
14 Nov 2023
DeMuX: Data-efficient Multilingual Learning
Simran Khanuja
Srinivas Gowriraj
Lucio Dery
Graham Neubig
VLM
71
1
0
10 Nov 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
107
333
0
06 Nov 2023
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages
Libo Qin
Qiguang Chen
Fuxuan Wei
Shijue Huang
Wanxiang Che
LRM
93
93
0
23 Oct 2023
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants
Lucas Bandarkar
Davis Liang
Benjamin Muller
Mikel Artetxe
Satya Narayan Shukla
Don Husa
Naman Goyal
Abhinandan Krishnan
Luke Zettlemoyer
Madian Khabsa
110
155
0
31 Aug 2023
Composing Parameter-Efficient Modules with Arithmetic Operations
Jinghan Zhang
Shiqi Chen
Junteng Liu
Junxian He
KELM
MoMe
83
124
0
26 Jun 2023
Exploring the Relationship between Alignment and Cross-lingual Transfer in Multilingual Transformers
Félix Gaschi
Patricio Cerda
Parisa Rastin
Y. Toussaint
64
13
0
05 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
120
315
0
02 Jun 2023
Towards a Common Understanding of Contributing Factors for Cross-Lingual Transfer in Multilingual Language Models: A Review
Fred Philippy
Siwen Guo
Shohreh Haddadan
LRM
53
37
0
26 May 2023
How do languages influence each other? Studying cross-lingual data sharing during LM fine-tuning
Rochelle Choenni
Dan Garrette
Ekaterina Shutova
85
16
0
22 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
114
123
0
22 May 2023
XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models
Davis Liang
Hila Gonen
Yuning Mao
Rui Hou
Naman Goyal
Marjan Ghazvininejad
Luke Zettlemoyer
Madian Khabsa
66
80
0
25 Jan 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
99
85
0
20 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
197
518
0
08 Dec 2022
Beyond English-Centric Bitexts for Better Multilingual Language Representation Learning
Barun Patra
Saksham Singhal
Shaohan Huang
Zewen Chi
Li Dong
Furu Wei
Vishrav Chaudhary
Xia Song
104
24
0
26 Oct 2022
Language Models are Multilingual Chain-of-Thought Reasoners
Freda Shi
Mirac Suzgun
Markus Freitag
Xuezhi Wang
Suraj Srivats
...
Yi Tay
Sebastian Ruder
Denny Zhou
Dipanjan Das
Jason W. Wei
ReLM
LRM
244
369
0
06 Oct 2022
ContraCLM: Contrastive Learning For Causal Language Model
Nihal Jain
Dejiao Zhang
Wasi Uddin Ahmad
Zijian Wang
Feng Nan
...
Ramesh Nallapati
Baishakhi Ray
Parminder Bhatia
Xiaofei Ma
Bing Xiang
69
16
0
03 Oct 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
298
343
0
11 Sep 2022
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team
Marta R. Costa-jussá
James Cross
Onur cCelebi
Maha Elbayad
...
Alexandre Mourachko
C. Ropers
Safiyyah Saleem
Holger Schwenk
Jeff Wang
MoE
230
1,266
0
11 Jul 2022
The Geometry of Multilingual Language Model Representations
Tyler A. Chang
Zhuowen Tu
Benjamin Bergen
90
68
0
22 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
93
146
0
12 May 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
161
1,009
1
10 Mar 2022
Training Neural Networks with Fixed Sparse Masks
Yi-Lin Sung
Varun Nair
Colin Raffel
FedML
93
208
0
18 Nov 2021
Merging Models with Fisher-Weighted Averaging
Michael Matena
Colin Raffel
FedML
MoMe
87
402
0
18 Nov 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
342
4,569
0
27 Oct 2021
Composable Sparse Fine-Tuning for Cross-Lingual Transfer
Alan Ansell
Edoardo Ponti
Anna Korhonen
Ivan Vulić
CLL
MoE
132
143
0
14 Oct 2021
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training
Bo Zheng
Li Dong
Shaohan Huang
Saksham Singhal
Wanxiang Che
Ting Liu
Xia Song
Furu Wei
VLM
55
22
0
15 Sep 2021
Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning
Runxin Xu
Fuli Luo
Zhiyuan Zhang
Chuanqi Tan
Baobao Chang
Songfang Huang
Fei Huang
LRM
176
190
0
13 Sep 2021
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
216
2,004
0
16 Aug 2021
First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
Benjamin Muller
Yanai Elazar
Benoît Sagot
Djamé Seddah
LRM
53
75
0
26 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
88
2,220
0
11 Jan 2021
ERNIE-M: Enhanced Multilingual Representation by Aligning Cross-lingual Semantics with Monolingual Corpora
Ouyang Xuan
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
121
101
0
31 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
82
406
0
14 Dec 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
184
4,553
0
07 Sep 2020
1
2
Next