Statistical Perspective of Top-K Sparse Softmax Gating Mixture of
ExpertsInternational Conference on Learning Representations (ICLR), 2023 |
Brainformers: Trading Simplicity for EfficiencyInternational Conference on Machine Learning (ICML), 2023 |
GradMDM: Adversarial Attack on Dynamic NetworksIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023 |
Memorization Capacity of Neural Networks with Conditional ComputationInternational Conference on Learning Representations (ICLR), 2023 |
Spatial Mixture-of-ExpertsNeural Information Processing Systems (NeurIPS), 2022 |
Switchable Representation Learning Framework with Self-compatibilityComputer Vision and Pattern Recognition (CVPR), 2022 |
Hub-Pathway: Transfer Learning from A Hub of Pre-trained ModelsNeural Information Processing Systems (NeurIPS), 2022 |
APG: Adaptive Parameter Generation Network for Click-Through Rate
PredictionNeural Information Processing Systems (NeurIPS), 2022 |
Mixture-of-Experts with Expert Choice RoutingNeural Information Processing Systems (NeurIPS), 2022 |
Efficient Large Scale Language Modeling with Mixtures of ExpertsConference on Empirical Methods in Natural Language Processing (EMNLP), 2021 |
Zoo-Tuning: Adaptive Transfer from a Zoo of ModelsInternational Conference on Machine Learning (ICML), 2021 |
Scaling Vision with Sparse Mixture of ExpertsNeural Information Processing Systems (NeurIPS), 2021 |
Dynamic Multi-Branch Layers for On-Device Neural Machine TranslationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2021 |
Dynamic Neural Networks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021 |
Switch Transformers: Scaling to Trillion Parameter Models with Simple
and Efficient SparsityJournal of machine learning research (JMLR), 2021 |
Entities as Experts: Sparse Memory Access with Entity SupervisionConference on Empirical Methods in Natural Language Processing (EMNLP), 2020 |
Efficient Content-Based Sparse Attention with Routing TransformersTransactions of the Association for Computational Linguistics (TACL), 2020 |
Large Memory Layers with Product KeysNeural Information Processing Systems (NeurIPS), 2019 |
Outrageously Large Neural Networks: The Sparsely-Gated
Mixture-of-Experts LayerInternational Conference on Learning Representations (ICLR), 2017 |