47

FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models

Annemette Brok Pirchert
Jacob Nielsen
Mogens Henrik From
Lukas Galke Poech
Peter Schneider-Kamp
Main:7 Pages
4 Figures
Bibliography:2 Pages
11 Tables
Appendix:9 Pages
Abstract

Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model to facilitate coordination. However, we hypothesize that full-sized experts may not be necessary for all domains and that instead low-rank adapters may be sufficient. Here, we introduce FlexMoRE, a Flexible Mixture of Rank-heterogenous Experts, which may be either full-sized experts or adapters of a suitable rank. We systematically investigate the trade-off between expert rank and downstream task performance by evaluating 66 experts with ranks 202^0 to 2142^{14} resulting in experiments covering 150 mixtures (96 with 2 experts, 54 with 7 experts) that are evaluated across 120120 tasks. For our experiments, we build on FlexOlmo and turn its pre-trained experts into low-rank versions. Our regression analysis from expert rank to downstream task performance reveals that the best-performing rank is substantially higher for reasoning-heavy benchmarks than for knowledge-heavy benchmarks. These findings on rank sensitivity come with direct implications for memory efficiency: Using optimal ranks, FlexMoRE yields improved downstream task performance (average score 47.1847.18) compared to the baseline FlexOlmo-style mixture of full-sized experts (average score 45.4645.46) at less than one third the parameters (10.7510.75B for FlexMoRE vs. 33.2733.27B for FlexOlmo). All code will be made available.

View on arXiv
Comments on this paper