Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.14158
Cited By
v1
v2 (latest)
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
26 August 2024
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
Honghui Ding
Kai Dong
Qiushi Du
Wenjun Gao
Kang Guan
Jianzhong Guo
Yongqiang Guo
Zhe Fu
Ying He
Panpan Huang
Jiashi Li
Wenfeng Liang
Xiaodong Liu
Xin Liu
Yiyuan Liu
Yuxuan Liu
Shanghao Lu
Xuan Lu
Xiaotao Nie
Tian Pei
Junjie Qiu
Hui Qu
Zhaochun Ren
Zhangli Sha
Xuecheng Su
Xiaowen Sun
Yixuan Tan
Minghui Tang
Shiyu Wang
Yaohui Wang
Yongji Wang
Ziwei Xie
Yiliang Xiong
Yanhong Xu
Shengfeng Ye
Shuiping Yu
Yukun Zha
Liyue Zhang
Haowei Zhang
Mingchuan Zhang
Wentao Zhang
Yichao Zhang
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning"
13 / 13 papers shown
Title
AI and Memory Wall
A. Gholami
Z. Yao
Sehoon Kim
Coleman Hooper
Michael W. Mahoney
Kurt Keutzer
72
158
0
21 Mar 2024
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
...
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
ReLM
LRM
146
1,274
0
05 Feb 2024
Zero Bubble Pipeline Parallelism
Penghui Qi
Xinyi Wan
Guangxing Huang
Min Lin
50
24
0
30 Nov 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,699
0
15 Mar 2023
Tutel: Adaptive Mixture-of-Experts at Scale
Changho Hwang
Wei Cui
Yifan Xiong
Ziyue Yang
Ze Liu
...
Joe Chau
Peng Cheng
Fan Yang
Mao Yang
Y. Xiong
MoE
177
121
0
07 Jun 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
524
6,293
0
05 Apr 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
110
302
0
14 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
477
7,819
0
11 Nov 2021
Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models
Dheevatsa Mudigere
Y. Hao
Jianyu Huang
Zhihao Jia
Andrew Tulloch
...
Ajit Mathews
Lin Qiao
M. Smelyanskiy
Bill Jia
Vijay Rao
81
153
0
12 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
Mohammad Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
122
699
0
09 Apr 2021
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
877
42,379
0
28 May 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
334
1,917
0
17 Sep 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
1