ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.08245
  4. Cited By
Scattered Mixture-of-Experts Implementation

Scattered Mixture-of-Experts Implementation

13 March 2024
Shawn Tan
Songlin Yang
Yikang Shen
Aaron Courville
    MoE
ArXiv (abs)PDFHTML

Papers citing "Scattered Mixture-of-Experts Implementation"

9 / 9 papers shown
Title
Mixture of Parrots: Experts improve memorization more than reasoning
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi
Clara Mohri
David Brandfonbrener
Alex Gu
Nikhil Vyas
Nikhil Anand
David Alvarez-Melis
Yuanzhi Li
Sham Kakade
Eran Malach
MoE
76
5
0
24 Oct 2024
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
Yulei Qian
Fengcun Li
Xiangyang Ji
Xiaoyu Zhao
Jianchao Tan
Kai Zhang
Xunliang Cai
MoE
101
3
0
16 Oct 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
151
1,085
0
08 Jan 2024
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
72
108
0
29 Nov 2022
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud
  Scale Production
Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud Scale Production
Young Jin Kim
Rawn Henry
Raffy Fahim
Hany Awadalla
MoE
55
23
0
18 Nov 2022
Mixture of Attention Heads: Selecting Attention Heads Per Token
Mixture of Attention Heads: Selecting Attention Heads Per Token
Xiaofeng Zhang
Songlin Yang
Zeyu Huang
Jie Zhou
Wenge Rong
Zhang Xiong
MoE
133
47
0
11 Oct 2022
Efficient Large-Scale Language Model Training on GPU Clusters Using
  Megatron-LM
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
113
689
0
09 Apr 2021
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
520
42,449
0
03 Dec 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,904
0
17 Sep 2019
1