ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.01507
  4. Cited By
Memory Efficient Optimizers with 4-bit States
v1v2v3 (latest)

Memory Efficient Optimizers with 4-bit States

4 September 2023
Bingrui Li
Jianfei Chen
Jun Zhu
    MQ
ArXiv (abs)PDFHTML

Papers citing "Memory Efficient Optimizers with 4-bit States"

29 / 29 papers shown
Title
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Hsi-Che Lin
Yu-Chu Yu
Kai-Po Chang
Y. Wang
79
0
0
13 Jun 2025
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order
Egor Petrov
Grigoriy Evseev
Aleksey Antonov
Andrey Veprikov
Pavel Plyusnin
Nikolay Bushkov
Stanislav Moiseev
Aleksandr Beznosikov
81
0
0
04 Jun 2025
MLorc: Momentum Low-rank Compression for Large Language Model Adaptation
MLorc: Momentum Low-rank Compression for Large Language Model Adaptation
Wei Shen
Zhang Yaxiang
Minhui Huang
Mengfan Xu
Jiawei Zhang
Cong Shen
AI4CE
70
0
0
02 Jun 2025
GradPower: Powering Gradients for Faster Language Model Pre-Training
GradPower: Powering Gradients for Faster Language Model Pre-Training
Mingze Wang
Jinbo Wang
Jiaqi Zhang
Wei Wang
Peng Pei
Xunliang Cai
Weinan E
Lei Wu
58
0
0
30 May 2025
In Search of Adam's Secret Sauce
In Search of Adam's Secret Sauce
Antonio Orvieto
Robert Gower
47
1
0
27 May 2025
Beyond the model: Key differentiators in large language models and multi-agent services
Beyond the model: Key differentiators in large language models and multi-agent services
Muskaan Goyal
Pranav Bhasin
LLMAGELM
474
0
0
05 May 2025
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Pushing the Limits of Low-Bit Optimizers: A Focus on EMA Dynamics
Cong Xu
Wenbin Liang
Mo Yu
Anan Liu
Kai Zhang
Lizhuang Ma
Jiangming Wang
Jun Wang
Weinan Zhang
Wei Zhang
MQ
84
0
0
01 May 2025
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Yaxiong Chen
Yujie Wang
Zixuan Zheng
Jingliang Hu
Yilei Shi
Shengwu Xiong
Xiao Xiang Zhu
Lichao Mou
148
0
0
18 Mar 2025
Identifying Sensitive Weights via Post-quantization Integral
Yuezhou Hu
Weiyu Huang
Zichen Liang
Chong Chen
Jintao Zhang
Jun Zhu
Jianfei Chen
MQ
160
6
0
28 Feb 2025
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
LORENZA: Enhancing Generalization in Low-Rank Gradient LLM Training via Efficient Zeroth-Order Adaptive SAM
Yehonathan Refael
Iftach Arbel
Ofir Lindenbaum
Tom Tirer
169
1
0
26 Feb 2025
Irrational Complex Rotations Empower Low-bit Optimizers
Irrational Complex Rotations Empower Low-bit Optimizers
Zhen Tian
Wayne Xin Zhao
Ji-Rong Wen
MQ
73
0
0
22 Jan 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael
Jonathan Svirsky
Boris Shustin
Wasim Huleihel
Ofir Lindenbaum
104
4
0
31 Dec 2024
No More Adam: Learning Rate Scaling at Initialization is All You Need
No More Adam: Learning Rate Scaling at Initialization is All You Need
Minghao Xu
Lichuan Xiang
Xu Cai
Hongkai Wen
137
3
0
16 Dec 2024
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for
  Scalable Training
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Philip Zmushko
Aleksandr Beznosikov
Martin Takáč
Samuel Horváth
78
2
0
12 Nov 2024
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources100K or 100 Days: Trade-offs when Pre-Training with Academic Resources100Kor100Days:Trade−offswhenPre−TrainingwithAcademicResources
Apoorv Khandelwal
Tian Yun
Nihal V. Nayak
Jack Merullo
Stephen H. Bach
Chen Sun
Ellie Pavlick
VLMAI4CEOnRL
109
2
0
30 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
173
11
0
25 Oct 2024
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Zeroth-Order Fine-Tuning of LLMs in Random Subspaces
Ziming Yu
Pan Zhou
Sike Wang
Jia Li
Hua Huang
80
2
0
11 Oct 2024
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic
  Knowledge Tuning
Parameter-Efficient Fine-Tuning of Large Language Models using Semantic Knowledge Tuning
Nusrat Jahan Prottasha
Asif Mahmud
Md. Shohanur Islam Sobuj
Prakash Bhat
Md. Kowsher
Niloofar Yousefi
O. Garibay
113
7
0
11 Oct 2024
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang
Jia Wei
Pengle Zhang
Jun-Jie Zhu
Jun Zhu
Jianfei Chen
VLMMQ
186
39
0
03 Oct 2024
Propulsion: Steering LLM with Tiny Fine-Tuning
Propulsion: Steering LLM with Tiny Fine-Tuning
Md. Kowsher
Nusrat Jahan Prottasha
Prakash Bhat
93
7
0
17 Sep 2024
Exploring Quantization for Efficient Pre-Training of Transformer
  Language Models
Exploring Quantization for Efficient Pre-Training of Transformer Language Models
Kamran Chitsaz
Quentin Fournier
Gonccalo Mordido
Sarath Chandar
MQ
95
4
0
16 Jul 2024
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse
  Gradients
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
Oscar Li
David Woodruff
Mona Diab
Virginia Smith
101
13
0
25 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Ruoyu Sun
136
58
0
24 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
114
5
0
14 Jun 2024
4-bit Shampoo for Memory-Efficient Network Training
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
155
9
0
28 May 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Roy Miles
Pradyumna Reddy
Ismail Elezi
Jiankang Deng
VLM
73
7
0
28 May 2024
Adapprox: Adaptive Approximation in Adam Optimization via Randomized
  Low-Rank Matrices
Adapprox: Adaptive Approximation in Adam Optimization via Randomized Low-Rank Matrices
Pengxiang Zhao
Ping Li
Yingjie Gu
Yi Zheng
Stephan Ludger Kölker
Zhefeng Wang
Xiaoming Yuan
54
2
0
22 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
108
230
0
06 Mar 2024
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
104
7
0
11 Oct 2023
1