179
v1v2 (latest)

GLUE: Gradient-free Learning to Unify Experts

Jong-Ik Park
Shreyas Chaudhari
Srinivasa Pranav
Carlee Joe-Wong
José M. F. Moura
Main:4 Pages
1 Figures
Bibliography:1 Pages
1 Tables
Abstract

In many deployed systems (multilingual ASR, cross-hospital imaging, region-specific perception), multiple pretrained specialist models coexist. Yet, new target domains often require domain expansion: a generalized model that performs well beyond any single specialist's domain. Given a new target domain, existing methods obtain a single strong initialization prior for the model parameters by blending expert models to initialize a target model. However, heuristic blending -- using mixing coefficients based on data size or proxy metrics -- often yields lower target-domain test accuracy, and learning these coefficients on the target domain's loss function typically requires computationally-expensive full backpropagation through a neural network. We propose GLUE, Gradient-free Learning to Unify Experts, which initializes the target model as a convex combination of fixed experts and learns the mixture coefficients of this combination via gradient-free two-point SPSA (simultaneous perturbation stochastic approximation) updates, requiring only two forward passes per step. Across experiments on three datasets and three network architectures, GLUE produces model parameter priors that can be fine-tuned to outperform baselines. GLUE improves test accuracy by up to 8.5% over data-size weighting and by up to 9.1% over proxy-metric selection. GLUE either outperforms backpropagation-based full-gradient mixing or matches its performance within 1.4%.

View on arXiv
Comments on this paper