Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.05036
Cited By
DEMix Layers: Disentangling Domains for Modular Language Modeling
11 August 2021
Suchin Gururangan
Michael Lewis
Ari Holtzman
Noah A. Smith
Luke Zettlemoyer
KELM
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DEMix Layers: Disentangling Domains for Modular Language Modeling"
32 / 32 papers shown
Title
NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation
Rob Romijnders
Stefanos Laskaridis
Ali Shahin Shamsabadi
Hamed Haddadi
64
0
0
25 Apr 2025
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed Routing
Sagi Shaier
Francisco Pereira
K. Wense
Lawrence E Hunter
Matt Jones
MoE
46
0
0
10 Oct 2024
Automatic Curriculum Expert Iteration for Reliable LLM Reasoning
Zirui Zhao
Hanze Dong
Amrita Saha
Caiming Xiong
Doyen Sahoo
LRM
35
3
0
10 Oct 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
81
2
0
13 Aug 2024
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang
Dong Shen
Chaoxiang Cai
Fan Yang
Size Li
Di Zhang
Xi Li
MoE
56
2
0
28 Jun 2024
Unlocking Continual Learning Abilities in Language Models
Wenyu Du
Shuang Cheng
Tongxu Luo
Zihan Qiu
Zeyu Huang
Ka Chun Cheung
Reynold Cheng
Jie Fu
KELM
CLL
51
6
0
25 Jun 2024
Investigating Continual Pretraining in Large Language Models: Insights and Implications
cCaugatay Yildiz
Nishaanth Kanna Ravichandran
Prishruit Punia
Matthias Bethge
Beyza Ermis
CLL
KELM
LRM
58
25
0
27 Feb 2024
Continual Pre-Training of Large Language Models: How to (re)warm your model?
Kshitij Gupta
Benjamin Thérien
Adam Ibrahim
Mats L. Richter
Quentin G. Anthony
Eugene Belilovsky
Irina Rish
Timothée Lesort
KELM
35
99
0
08 Aug 2023
Class-Incremental Learning based on Label Generation
Yijia Shao
Yiduo Guo
Dongyan Zhao
Bin Liu
CLL
29
14
0
22 Jun 2023
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
Aitor Ormazabal
Mikel Artetxe
Eneko Agirre
33
19
0
23 May 2023
Logion: Machine Learning for Greek Philology
Charlie Cowen-Breen
Creston Brooks
J. Haubold
B. Graziosi
21
4
0
01 May 2023
Scaling Expert Language Models with Unsupervised Domain Discovery
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
25
46
0
24 Mar 2023
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal
Yulin Luo
Rui Zhao
Xi Wei
Jinwei Chen
Yijie Lu
Shenghao Xie
Tianyu Wang
Ruiqin Xiong
Ming Lu
Shanghang Zhang
31
3
0
24 Mar 2023
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
E. Ponti
MoMe
OOD
32
73
0
22 Feb 2023
AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models
Alexandra Chronopoulou
Matthew E. Peters
Alexander Fraser
Jesse Dodge
MoMe
26
65
0
14 Feb 2023
RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction
Donghao Zhou
Chunbin Gu
Junde Xu
Furui Liu
Qiong Wang
Guangyong Chen
Pheng-Ann Heng
MoE
13
4
0
20 Dec 2022
Z-ICL: Zero-Shot In-Context Learning with Pseudo-Demonstrations
Xinxi Lyu
Sewon Min
Iz Beltagy
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
22
62
0
19 Dec 2022
MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion
Zi Gong
Yinpeng Guo
Pingyi Zhou
Cuiyun Gao
Yasheng Wang
Zenglin Xu
14
8
0
19 Dec 2022
Continual Learning of Natural Language Processing Tasks: A Survey
Zixuan Ke
Bin Liu
KELM
CLL
VLM
29
68
0
23 Nov 2022
Disentangling Reasoning Capabilities from Language Models with Compositional Reasoning Transformers
Wanjun Zhong
Tingting Ma
Jiahai Wang
Jian Yin
T. Zhao
Chin-Yew Lin
Nan Duan
LRM
CoGe
33
2
0
20 Oct 2022
Few-Shot Anaphora Resolution in Scientific Protocols via Mixtures of In-Context Experts
Nghia T. Le
Fan Bai
Alan Ritter
35
12
0
07 Oct 2022
An Empirical Investigation of Commonsense Self-Supervision with Knowledge Graphs
Jiarui Zhang
Filip Ilievski
Kaixin Ma
Jonathan M Francis
A. Oltramari
SSL
16
5
0
21 May 2022
Unified Modeling of Multi-Domain Multi-Device ASR Systems
Soumyajit Mitra
Swayambhu Nath Ray
Bharat Padi
Arunasish Sen
Raghavendra Bilgi
Harish Arsikere
Shalini Ghosh
A. Srinivasamurthy
Sri Garimella
34
3
0
13 May 2022
Lifting the Curse of Multilinguality by Pre-training Modular Transformers
Jonas Pfeiffer
Naman Goyal
Xi Lin
Xian Li
James Cross
Sebastian Riedel
Mikel Artetxe
LRM
40
139
0
12 May 2022
KALA: Knowledge-Augmented Language Model Adaptation
Minki Kang
Jinheon Baek
Sung Ju Hwang
VLM
KELM
36
34
0
22 Apr 2022
ELLE: Efficient Lifelong Pre-training for Emerging Data
Yujia Qin
Jiajie Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
30
67
0
12 Mar 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
181
0
17 Feb 2022
Efficient Hierarchical Domain Adaptation for Pretrained Language Models
Alexandra Chronopoulou
Matthew E. Peters
Jesse Dodge
33
42
0
16 Dec 2021
Adapting to the Long Tail: A Meta-Analysis of Transfer Learning Research for Language Understanding Tasks
Aakanksha Naik
J. Lehman
Carolyn Rose
39
7
0
02 Nov 2021
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
Xisen Jin
Dejiao Zhang
Henghui Zhu
Wei Xiao
Shang-Wen Li
Xiaokai Wei
Andrew O. Arnold
Xiang Ren
KELM
CLL
31
110
0
16 Oct 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
261
4,489
0
23 Jan 2020
1