134
2

Near-Optimal Fully First-Order Algorithms for Finding Stationary Points in Bilevel Optimization

Main:48 Pages
3 Figures
Bibliography:8 Pages
3 Tables
Abstract

Bilevel optimization has various applications such as hyper-parameter optimization and meta-learning. Designing theoretically efficient algorithms for bilevel optimization is more challenging than standard optimization because the lower-level problem defines the feasibility set implicitly via another optimization problem. One tractable case is when the lower-level problem permits strong convexity. Recent works show that second-order methods can provably converge to an ϵ\epsilon-first-order stationary point of the problem at a rate of O~(ϵ2)\tilde{\mathcal{O}}(\epsilon^{-2}), yet these algorithms require a Hessian-vector product oracle. Kwon et al. (2023) resolved the problem by proposing a first-order method that can achieve the same goal at a slower rate of O~(ϵ3)\tilde{\mathcal{O}}(\epsilon^{-3}). In this work, we provide an improved analysis demonstrating that the first-order method can also find an ϵ\epsilon-first-order stationary point within O~(ϵ2)\tilde {\mathcal{O}}(\epsilon^{-2}) oracle complexity, which matches the upper bounds for second-order methods in the dependency on ϵ\epsilon. Our analysis further leads to simple first-order algorithms that can achieve similar near-optimal rates in finding second-order stationary points and in distributed bilevel problems.

View on arXiv
@article{chen2025_2306.14853,
  title={ Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles },
  author={ Lesi Chen and Yaohua Ma and Jingzhao Zhang },
  journal={arXiv preprint arXiv:2306.14853},
  year={ 2025 }
}
Comments on this paper