Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.14177
Cited By
Scaling Expert Language Models with Unsupervised Domain Discovery
24 March 2023
Suchin Gururangan
Margaret Li
M. Lewis
Weijia Shi
Tim Althoff
Noah A. Smith
Luke Zettlemoyer
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Expert Language Models with Unsupervised Domain Discovery"
41 / 41 papers shown
Title
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
38
0
0
07 May 2025
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs
Yuhang Zhou
Giannis Karamanolakis
Victor Soto
Anna Rumshisky
Mayank Kulkarni
Furong Huang
Wei Ai
Jianhua Lu
MoMe
106
0
0
03 Feb 2025
Copyright-Protected Language Generation via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
74
1
0
09 Dec 2024
Scalable Data Ablation Approximations for Language Models through Modular Training and Merging
Clara Na
Ian H. Magnusson
A. Jha
Tom Sherborne
Emma Strubell
Jesse Dodge
Pradeep Dasigi
MoMe
36
4
0
21 Oct 2024
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Shangbin Feng
Zifeng Wang
Yike Wang
Sayna Ebrahimi
Hamid Palangi
...
Nathalie Rauschmayr
Yejin Choi
Yulia Tsvetkov
Chen-Yu Lee
Tomas Pfister
MoMe
35
3
0
15 Oct 2024
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
36
0
0
04 Oct 2024
Contextual Document Embeddings
John X. Morris
Alexander M. Rush
19
7
0
03 Oct 2024
SynTQA: Synergistic Table-based Question Answering via Mixture of Text-to-SQL and E2E TQA
Siyue Zhang
Anh Tuan Luu
Chen Zhao
LMTD
29
4
0
25 Sep 2024
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
Nikolas Gritsch
Qizhen Zhang
Acyr F. Locatelli
Sara Hooker
A. Ustun
MoE
50
1
0
28 Aug 2024
MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing
Hao Zhou
Zhijun Wang
Shujian Huang
Xin Huang
Xue Han
Junlan Feng
Chao Deng
Weihua Luo
Jiajun Chen
CLL
MoE
49
5
0
21 Aug 2024
Layerwise Recurrent Router for Mixture-of-Experts
Zihan Qiu
Zeyu Huang
Shuang Cheng
Yizhi Zhou
Zili Wang
Ivan Titov
Jie Fu
MoE
78
2
0
13 Aug 2024
Active Testing of Large Language Model via Multi-Stage Sampling
Yuheng Huang
Jiayang Song
Qiang Hu
Felix Juefei-Xu
Lei Ma
27
2
0
07 Aug 2024
Strong Copyright Protection for Language Models via Adaptive Model Fusion
Javier Abad
Konstantin Donhauser
Francesco Pinto
Fanny Yang
45
4
0
29 Jul 2024
PLeaS -- Merging Models with Permutations and Least Squares
Anshul Nasery
J. Hayase
Pang Wei Koh
Sewoong Oh
MoMe
45
3
0
02 Jul 2024
M2QA: Multi-domain Multilingual Question Answering
Leon Arne Engländer
Hannah Sterz
Clifton A. Poth
Jonas Pfeiffer
Ilia Kuznetsov
Iryna Gurevych
VLM
35
1
0
01 Jul 2024
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models
Renzhi Wang
Piji Li
KELM
CLL
44
7
0
28 Jun 2024
Towards Modular LLMs by Building and Reusing a Library of LoRAs
O. Ostapenko
Zhan Su
E. Ponti
Laurent Charlin
Nicolas Le Roux
Matheus Pereira
Lucas Page-Caccia
Alessandro Sordoni
MoMe
37
31
0
18 May 2024
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts
R. Prabhakar
R. Sivaramakrishnan
Darshan Gandhi
Yun Du
Mingran Wang
...
Urmish Thakker
Dawei Huang
Sumti Jairath
Kevin J. Brown
K. Olukotun
MoE
39
12
0
13 May 2024
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Seungone Kim
Juyoung Suk
Shayne Longpre
Bill Yuchen Lin
Jamin Shin
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
MoMe
ALM
ELM
49
167
0
02 May 2024
MoDE: CLIP Data Experts via Clustering
Jiawei Ma
Po-Yao Huang
Saining Xie
Shang-Wen Li
Luke Zettlemoyer
Shih-Fu Chang
Wen-tau Yih
Hu Xu
MoE
CLIP
VLM
28
10
0
24 Apr 2024
Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization
Costas Mavromatis
Petros Karypis
George Karypis
MoMe
27
24
0
17 Apr 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
31
2
0
26 Mar 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
42
62
0
25 Mar 2024
DiPaCo: Distributed Path Composition
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
A. Kuncoro
Yani Donchev
Rachita Chhaparia
Ionel Gog
MarcÁurelio Ranzato
Jiajun Shen
Arthur Szlam
MoE
40
2
0
15 Mar 2024
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM
Sainbayar Sukhbaatar
O. Yu. Golovneva
Vasu Sharma
Hu Xu
Xi Victoria Lin
...
Jacob Kahn
Shang-Wen Li
Wen-tau Yih
Jason Weston
Xian Li
MoMe
OffRL
MoE
38
60
0
12 Mar 2024
What the Weight?! A Unified Framework for Zero-Shot Knowledge Composition
Carolin Holtermann
Markus Frohmann
Navid Rekabsaz
Anne Lauscher
MoMe
22
5
0
23 Jan 2024
Breaking the Curse of Multilinguality with Cross-lingual Expert Language Models
Terra Blevins
Tomasz Limisiewicz
Suchin Gururangan
Margaret Li
Hila Gonen
Noah A. Smith
Luke Zettlemoyer
44
22
0
19 Jan 2024
AboutMe: Using Self-Descriptions in Webpages to Document the Effects of English Pretraining Data Filters
L. Lucy
Suchin Gururangan
Luca Soldaini
Emma Strubell
David Bamman
Lauren Klein
Jesse Dodge
26
14
0
12 Jan 2024
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models
Alan Chan
Ben Bucknall
Herbie Bradley
David M. Krueger
11
6
0
22 Dec 2023
Efficient Online Data Mixing For Language Model Pre-Training
Alon Albalak
Liangming Pan
Colin Raffel
W. Wang
28
32
0
05 Dec 2023
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Joshua Belofsky
MoMe
19
13
0
17 Nov 2023
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
56
31
0
14 Nov 2023
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang
Lucas Page-Caccia
O. Ostapenko
Xingdi Yuan
William Yang Wang
Alessandro Sordoni
LRM
33
2
0
09 Oct 2023
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min
Suchin Gururangan
Eric Wallace
Hannaneh Hajishirzi
Noah A. Smith
Luke Zettlemoyer
AILaw
22
63
0
08 Aug 2023
Getting MoRE out of Mixture of Language Model Reasoning Experts
Chenglei Si
Weijia Shi
Chen Zhao
Luke Zettlemoyer
Jordan L. Boyd-Graber
LRM
24
24
0
24 May 2023
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models
Aitor Ormazabal
Mikel Artetxe
Eneko Agirre
33
19
0
23 May 2023
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng
Weijia Shi
Yuyang Bai
Vidhisha Balachandran
Tianxing He
Yulia Tsvetkov
KELM
45
28
0
17 May 2023
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
327
0
18 Feb 2022
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
119
106
0
24 Sep 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,120
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
253
1,986
0
31 Dec 2020
1