Zeroth-order (ZO) optimization is an emerging deep neural network (DNN) training paradigm that offers computational simplicity and memory savings. However, this seemingly promising approach faces a significant and long-ignored challenge. ZO requires generating a substantial number of Gaussian random numbers, which poses significant difficulties and even makes it infeasible for hardware platforms, such as FPGAs and ASICs. In this paper, we identify this critical issue, which arises from the mismatch between algorithm and hardware designers. To address this issue, we proposed PeZO, a perturbation-efficient ZO framework. Specifically, we design random number reuse strategies to significantly reduce the demand for random number generation and introduce a hardware-friendly adaptive scaling method to replace the costly Gaussian distribution with a uniform distribution. Our experiments show that PeZO reduces the required LUTs and FFs for random number generation by 48.6\% and 12.7\%, and saves at maximum 86\% power consumption, all without compromising training performance, making ZO optimization feasible for on-device training. To the best of our knowledge, we are the first to explore the potential of on-device ZO optimization, providing valuable insights for future research.
View on arXiv@article{tan2025_2504.20314, title={ Perturbation-efficient Zeroth-order Optimization for Hardware-friendly On-device Training }, author={ Qitao Tan and Sung-En Chang and Rui Xia and Huidong Ji and Chence Yang and Ci Zhang and Jun Liu and Zheng Zhan and Zhou Zou and Yanzhi Wang and Jin Lu and Geng Yuan }, journal={arXiv preprint arXiv:2504.20314}, year={ 2025 } }