ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.06905
  4. Cited By
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

13 December 2021
Nan Du
Yanping Huang
Andrew M. Dai
Simon Tong
Dmitry Lepikhin
Yuanzhong Xu
M. Krikun
Yanqi Zhou
Adams Wei Yu
Orhan Firat
Barret Zoph
Liam Fedus
Maarten Bosma
Zongwei Zhou
Tao Wang
Yu Emma Wang
Kellie Webster
Marie Pellat
Kevin Robinson
Kathy Meier-Hellstern
Toju Duke
Lucas Dixon
Kun Zhang
Quoc V. Le
Yonghui Wu
Z. Chen
Claire Cui
    ALM
    MoE
ArXivPDFHTML

Papers citing "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts"

50 / 178 papers shown
Title
No Need to Talk: Asynchronous Mixture of Language Models
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova
Angelos Katharopoulos
David Grangier
Ronan Collobert
MoE
44
0
0
04 Oct 2024
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le
Chau Nguyen
Huy Nguyen
Quyen Tran
Trung Le
Nhat Ho
44
5
0
03 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffM
MoE
86
5
0
02 Oct 2024
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of Experts
Hugo Inzirillo
Remi Genet
MoE
38
4
0
23 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
63
23
0
10 Sep 2024
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
Zanlin Ni
Yulin Wang
Renping Zhou
Rui Lu
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Yuan Yao
Gao Huang
37
7
0
31 Aug 2024
NVC-1B: A Large Neural Video Coding Model
NVC-1B: A Large Neural Video Coding Model
Xihua Sheng
Chuanbo Tang
Li Li
Dong Liu
Feng Wu
3DV
VLM
50
3
0
28 Jul 2024
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs
Quang H. Nguyen
Duy C. Hoang
Juliette Decugis
Saurav Manchanda
Nitesh V. Chawla
Khoa D. Doan
Khoa D. Doan
45
8
0
15 Jul 2024
OLMES: A Standard for Language Model Evaluations
OLMES: A Standard for Language Model Evaluations
Yuling Gu
Oyvind Tafjord
Bailey Kuehl
Dany Haddad
Jesse Dodge
Hannaneh Hajishirzi
ELM
40
14
0
12 Jun 2024
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts
  Approach
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach
Haoyu Han
Juanhui Li
Wei Huang
Xianfeng Tang
Hanqing Lu
Chen Luo
Hui Liu
Jiliang Tang
42
6
0
05 Jun 2024
Ensembling Diffusion Models via Adaptive Feature Aggregation
Ensembling Diffusion Models via Adaptive Feature Aggregation
Cong Wang
Kuan Tian
Yonghang Guan
Jun Zhang
Zhiwei Jiang
Fei Shen
Xiao Han
44
5
0
27 May 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
30
0
18 May 2024
Multi-Head Mixture-of-Experts
Multi-Head Mixture-of-Experts
Xun Wu
Shaohan Huang
Wenhui Wang
Furu Wei
MoE
39
12
0
23 Apr 2024
From Matching to Generation: A Survey on Generative Information Retrieval
From Matching to Generation: A Survey on Generative Information Retrieval
Xiaoxi Li
Jiajie Jin
Yujia Zhou
Yuyao Zhang
Peitian Zhang
Yutao Zhu
Zhicheng Dou
3DV
84
47
0
23 Apr 2024
Navigating the Landscape of Large Language Models: A Comprehensive
  Review and Analysis of Paradigms and Fine-Tuning Strategies
Navigating the Landscape of Large Language Models: A Comprehensive Review and Analysis of Paradigms and Fine-Tuning Strategies
Benjue Weng
LM&MA
46
8
0
13 Apr 2024
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Yikang Shen
Zhen Guo
Tianle Cai
Zengyi Qin
MoE
ALM
46
28
0
11 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
57
64
0
25 Mar 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
32
31
0
26 Feb 2024
Balanced Data Sampling for Language Model Training with Clustering
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
37
9
0
22 Feb 2024
Automating Dataset Updates Towards Reliable and Timely Evaluation of
  Large Language Models
Automating Dataset Updates Towards Reliable and Timely Evaluation of Large Language Models
Jiahao Ying
Yixin Cao
Yushi Bai
Qianru Sun
Bo Wang
Wei Tang
Zhaojun Ding
Yizhe Yang
Xuanjing Huang
Shuicheng Yan
KELM
26
6
0
19 Feb 2024
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori
Tian Tang
Yile Gu
Kan Zhu
Baris Kasikci
71
20
0
10 Feb 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
134
375
0
09 Feb 2024
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse
  LLMs
ReLU2^22 Wins: Discovering Efficient Activation Functions for Sparse LLMs
Zhengyan Zhang
Yixin Song
Guanghui Yu
Xu Han
Yankai Lin
Chaojun Xiao
Chenyang Song
Zhiyuan Liu
Zeyu Mi
Maosong Sun
22
31
0
06 Feb 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large
  Models
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
43
16
0
29 Jan 2024
Developing ChatGPT for Biology and Medicine: A Complete Review of
  Biomedical Question Answering
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering
Qing Li
Lei Li
Yu Li
LM&MA
AI4MH
41
6
0
15 Jan 2024
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
20
241
0
21 Dec 2023
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks
  for Chinese Large Language Models
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
Dan Shi
Chaobin You
Jian-Tao Huang
Taihao Li
Deyi Xiong
LRM
30
0
0
20 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
48
29
0
19 Dec 2023
TinyGSM: achieving >80% on GSM8k with small language models
TinyGSM: achieving >80% on GSM8k with small language models
Bingbin Liu
Sébastien Bubeck
Ronen Eldan
Janardhan Kulkarni
Yuanzhi Li
Anh Nguyen
Rachel A. Ward
Yi Zhang
ALM
27
47
0
14 Dec 2023
Enhancing Molecular Property Prediction via Mixture of Collaborative
  Experts
Enhancing Molecular Property Prediction via Mixture of Collaborative Experts
Xu Yao
Shuang Liang
Songqiao Han
Hailiang Huang
29
4
0
06 Dec 2023
Language-driven All-in-one Adverse Weather Removal
Language-driven All-in-one Adverse Weather Removal
Hao Yang
Liyuan Pan
Yan Yang
Wei Liang
VLM
KELM
29
18
0
03 Dec 2023
Oasis: Data Curation and Assessment System for Pretraining of Large
  Language Models
Oasis: Data Curation and Assessment System for Pretraining of Large Language Models
Tong Zhou
Yubo Chen
Pengfei Cao
Kang Liu
Jun Zhao
Shengping Liu
29
3
0
21 Nov 2023
Mixture of Weak & Strong Experts on Graphs
Mixture of Weak & Strong Experts on Graphs
Hanqing Zeng
Hanjia Lyu
Diyi Hu
Yinglong Xia
Jiebo Luo
31
3
0
09 Nov 2023
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
31
83
0
31 Oct 2023
Image Clustering Conditioned on Text Criteria
Image Clustering Conditioned on Text Criteria
Sehyun Kwon
Jaeseung Park
Minkyu Kim
Jaewoong Cho
Ernest K. Ryu
Kangwook Lee
VLM
42
11
0
27 Oct 2023
A General Theory for Softmax Gating Multinomial Logistic Mixture of
  Experts
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts
Huy Nguyen
Pedram Akbarian
TrungTin Nguyen
Nhat Ho
32
11
0
22 Oct 2023
Farzi Data: Autoregressive Data Distillation
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
23
3
0
15 Oct 2023
Diversifying the Mixture-of-Experts Representation for Language Models
  with Orthogonal Optimizer
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer
Boan Liu
Liang Ding
Li Shen
Keqin Peng
Yu Cao
Dazhao Cheng
Dacheng Tao
MoE
36
7
0
15 Oct 2023
Adaptivity and Modularity for Efficient Generalization Over Task
  Complexity
Adaptivity and Modularity for Efficient Generalization Over Task Complexity
Samira Abnar
Omid Saremi
Laurent Dinh
Shantel Wilson
Miguel Angel Bautista
...
Vimal Thilak
Etai Littwin
Jiatao Gu
Josh Susskind
Samy Bengio
41
5
0
13 Oct 2023
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Wei Ping
Ming-Yu Liu
Lawrence C. McAfee
Peng Xu
Bo Li
M. Shoeybi
Bryan Catanzaro
RALM
16
46
0
11 Oct 2023
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language
  Models
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models
Ahmad Faiz
S. Kaneda
Ruhan Wang
Rita Osi
Parteek Sharma
Fan Chen
Lei Jiang
31
56
0
25 Sep 2023
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Efficiency is Not Enough: A Critical Perspective of Environmentally Sustainable AI
Dustin Wright
Christian Igel
Gabrielle Samuel
Raghavendra Selvan
35
15
0
05 Sep 2023
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Robust Mixture-of-Expert Training for Convolutional Neural Networks
Yihua Zhang
Ruisi Cai
Tianlong Chen
Guanhua Zhang
Huan Zhang
Pin-Yu Chen
Shiyu Chang
Zhangyang Wang
Sijia Liu
MoE
AAML
OOD
36
16
0
19 Aug 2023
Large Language Models and Foundation Models in Smart Agriculture:
  Basics, Opportunities, and Challenges
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
39
3
0
13 Aug 2023
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin
Ke Wu
Jie Li
Jun Yu Li
Wu-Jun Li
39
2
0
31 Jul 2023
Generator-Retriever-Generator Approach for Open-Domain Question
  Answering
Generator-Retriever-Generator Approach for Open-Domain Question Answering
Abdelrahman Abdallah
Adam Jatowt
RALM
36
10
0
21 Jul 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
126
11,099
0
18 Jul 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
71
189
0
29 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALM
MoE
38
54
0
24 May 2023
Emergent inabilities? Inverse scaling over the course of pretraining
Emergent inabilities? Inverse scaling over the course of pretraining
J. Michaelov
Benjamin Bergen
LRM
ReLM
22
3
0
24 May 2023
Previous
1234
Next