
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Papers citing "Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models"
50 / 56 papers shown
Title |
---|
![]() Mixtral of Experts Albert Q. Jiang Alexandre Sablayrolles Antoine Roux A. Mensch Blanche Savary ...Théophile Gervet Thibaut Lavril Thomas Wang Timothée Lacroix William El Sayed |
![]() Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ...Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom |
![]() MoEfication: Transformer Feed-forward Layers are Mixtures of Experts Zhengyan Zhang Yankai Lin Zhiyuan Liu Peng Li Maosong Sun Jie Zhou |