ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12370
  4. Cited By
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models

28 January 2025
Samira Abnar
Harshay Shah
Dan Busbridge
Alaaeldin Mohamed Elnouby Ali
J. Susskind
Vimal Thilak
    MoE
    LRM
ArXivPDFHTML

Papers citing "Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models"

4 / 4 papers shown
Title
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
Enric Boix Adserà
Philippe Rigollet
MoE
28
0
0
11 May 2025
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs
Piotr Nawrot
Robert Li
Renjie Huang
Sebastian Ruder
Kelly Marchisio
Edoardo Ponti
39
0
0
24 Apr 2025
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts
Roussel Desmond Nzoyem
David A.W. Barton
Tom Deakin
80
2
0
07 Feb 2025
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Pierre Ablin
Angelos Katharopoulos
Skyler Seto
David Grangier
MoMe
52
0
0
03 Feb 2025
1