Title
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach Justin Sybrandt Ilya Tyagin M. Shtutman Ilya Safro 54 37 0 13 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts Max Ryabinin Anton I. Gusev FedML 96 52 0 10 Feb 2020
On the distance between two neural networks and the stability of learning Jeremy Bernstein Arash Vahdat Yisong Yue Xuan Li ODL 284 59 0 09 Feb 2020
Multilingual is not enough: BERT for Finnish Antti Virtanen Jenna Kanerva Rami Ilo Jouni Luoma Juhani Luotolahti T. Salakoski Filip Ginter S. Pyysalo 95 280 0 15 Dec 2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates Cong Xie Oluwasanmi Koyejo Indranil Gupta Yanghua Peng 73 42 0 20 Nov 2019
On the Cross-lingual Transferability of Monolingual Representations Mikel Artetxe Sebastian Ruder Dani Yogatama 289 801 0 25 Oct 2019
Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos Ji Lin Chuang Gan Song Han 78 10 0 01 Oct 2019
Extremely Small BERT Models from Mixed-Vocabulary Training Sanqiang Zhao Raghav Gupta Yang Song Denny Zhou VLM 76 53 0 25 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 406 1,926 0 17 Sep 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems Haidong Rong Yangzihao Wang Feihu Zhou Junjie Zhai Haiyang Wu ... Fan Li Han Zhang Yuekui Yang Zhenyu Guo Di Wang OffRL 51 11 0 10 Sep 2019
Taming Momentum in a Distributed Asynchronous Environment Ido Hakimi Saar Barkai Moshe Gabel Assaf Schuster 93 23 0 26 Jul 2019