Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.16682
Cited By
Methods of improving LLM training stability
22 October 2024
Oleg Rybakov
Mike Chrzanowski
Peter Dykas
Jinze Xue
Ben Lanir
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Methods of improving LLM training stability"
10 / 10 papers shown
Title
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
79
0
0
06 Mar 2025
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
87
99
0
25 Sep 2023
A Theory on Adam Instability in Large-Scale Machine Learning
Igor Molybog
Peter Albert
Moya Chen
Zach DeVito
David Esiobu
...
Puxin Xu
Yuchen Zhang
Melanie Kambadur
Stephen Roller
Susan Zhang
AI4CE
68
35
0
19 Apr 2023
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
253
156
0
17 Sep 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
288
2,521
0
20 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
160
1,021
0
31 Mar 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
334
1,917
0
17 Sep 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
204
3,528
0
19 Aug 2018
Neural Combinatorial Optimization with Reinforcement Learning
Irwan Bello
Hieu H. Pham
Quoc V. Le
Mohammad Norouzi
Samy Bengio
158
1,493
0
29 Nov 2016
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
350
8,174
0
13 Aug 2016
1