ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.05722
  4. Cited By
LightSeq2: Accelerated Training for Transformer-based Models on GPUs

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

12 October 2021
Xiaohui Wang
Yang Wei
Ying Xiong
Guyue Huang
Xian Qian
Yufei Ding
Mingxuan Wang
Lei Li
    VLM
ArXivPDFHTML

Papers citing "LightSeq2: Accelerated Training for Transformer-based Models on GPUs"

21 / 21 papers shown
Title
Achieving Peak Performance for Large Language Models: A Systematic
  Review
Achieving Peak Performance for Large Language Models: A Systematic Review
Z. R. K. Rostam
Sándor Szénási
Gábor Kertész
40
3
0
07 Sep 2024
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Nikoletta Koilia
C. Kachris
55
5
0
05 Sep 2024
PALM: A Efficient Performance Simulator for Tiled Accelerators with
  Large-scale Model Training
PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training
Jiahao Fang
Huizheng Wang
Qize Yang
Dehao Kong
Xu Dai
Jinyi Deng
Yang Hu
Shouyi Yin
30
1
0
06 Jun 2024
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for
  Boosting Neural Network Training
Approximated Likelihood Ratio: A Forward-Only and Parallel Framework for Boosting Neural Network Training
Zeliang Zhang
Jinyang Jiang
Zhuo Liu
Susan Liang
Yijie Peng
Chenliang Xu
29
0
0
18 Mar 2024
A Survey on Hardware Accelerators for Large Language Models
A Survey on Hardware Accelerators for Large Language Models
C. Kachris
33
14
0
18 Jan 2024
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Chameleon: a Heterogeneous and Disaggregated Accelerator System for Retrieval-Augmented Language Models
Wenqi Jiang
Marco Zeller
R. Waleffe
Torsten Hoefler
Gustavo Alonso
54
16
0
15 Oct 2023
Benchmarking and In-depth Performance Study of Large Language Models on
  Habana Gaudi Processors
Benchmarking and In-depth Performance Study of Large Language Models on Habana Gaudi Processors
Chengming Zhang
Baixi Sun
Xiaodong Yu
Zhen Xie
Weijian Zheng
K. Iskra
Pete Beckman
Dingwen Tao
22
4
0
29 Sep 2023
NNQS-Transformer: an Efficient and Scalable Neural Network Quantum
  States Approach for Ab initio Quantum Chemistry
NNQS-Transformer: an Efficient and Scalable Neural Network Quantum States Approach for Ab initio Quantum Chemistry
Yangjun Wu
Chu Guo
Yi Fan
P. Zhou
Honghui Shang
GNN
27
15
0
29 Jun 2023
Directed Acyclic Transformer Pre-training for High-quality
  Non-autoregressive Text Generation
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation
Fei Huang
Pei Ke
Minlie Huang
AI4CE
35
8
0
24 Apr 2023
Life Regression based Patch Slimming for Vision Transformers
Life Regression based Patch Slimming for Vision Transformers
Jiawei Chen
Lin Chen
Jianguo Yang
Tianqi Shi
Lechao Cheng
Zunlei Feng
Min-Gyoo Song
ViT
36
4
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
The VolcTrans System for WMT22 Multilingual Machine Translation Task
The VolcTrans System for WMT22 Multilingual Machine Translation Task
Xian Qian
Kai Hu
Jiaqiang Wang
Yifeng Liu
Xingyuan Pan
Jun Cao
Mingxuan Wang
26
1
0
20 Oct 2022
PARAGEN : A Parallel Generation Toolkit
PARAGEN : A Parallel Generation Toolkit
Jiangtao Feng
Yi Zhou
Jun Zhang
Xian Qian
Liwei Wu
Zhexi Zhang
Yanming Liu
Mingxuan Wang
Lei Li
Hao Zhou
VLM
30
3
0
07 Oct 2022
ByteTransformer: A High-Performance Transformer Boosted for
  Variable-Length Inputs
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Yujia Zhai
Chengquan Jiang
Leyuan Wang
Xiaoying Jia
Shang Zhang
Zizhong Chen
Xin Liu
Yibo Zhu
62
48
0
06 Oct 2022
TgDLF2.0: Theory-guided deep-learning for electrical load forecasting
  via Transformer and transfer learning
TgDLF2.0: Theory-guided deep-learning for electrical load forecasting via Transformer and transfer learning
Jiaxin Gao
Wenbo Hu
Dongxiao Zhang
Yuntian Chen
AI4TS
AI4CE
39
2
0
05 Oct 2022
Boosting Distributed Training Performance of the Unpadded BERT Model
Boosting Distributed Training Performance of the Unpadded BERT Model
Jinle Zeng
Min Li
Zhihua Wu
Jiaqi Liu
Yuang Liu
Dianhai Yu
Yanjun Ma
17
10
0
17 Aug 2022
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle
Guoxia Wang
Xiaomin Fang
Zhihua Wu
Yiqun Liu
Yang Xue
Yingfei Xiang
Dianhai Yu
Fan Wang
Yanjun Ma
33
31
0
12 Jul 2022
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
FastFold: Reducing AlphaFold Training Time from 11 Days to 67 Hours
Shenggan Cheng
Xuanlei Zhao
Guangyang Lu
Bin-Rui Li
Zhongming Yu
Tian Zheng
R. Wu
Xiwen Zhang
Jian Peng
Yang You
AI4CE
27
30
0
02 Mar 2022
Benchmark Assessment for DeepSpeed Optimization Library
Benchmark Assessment for DeepSpeed Optimization Library
G. Liang
I. Alsmadi
32
3
0
12 Feb 2022
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1