ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.01753
26
25

On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

4 September 2023
Jeongyeol Kwon
Dohyun Kwon
Steve Wright
Robert D. Nowak
ArXivPDFHTML
Abstract

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter σ>0\sigma > 0σ>0. In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be O(σ)O(\sigma)O(σ)-close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as O(σ)O(\sigma)O(σ)-approximation of the original BO, we propose first-order algorithms that find an ϵ\epsilonϵ-stationary solution by optimizing the penalty formulation with σ=O(ϵ)\sigma = O(\epsilon)σ=O(ϵ). When the perturbed lower-level problem uniformly satisfies the small-error proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an ϵ\epsilonϵ-stationary point of the penalty function, using in total O(ϵ−3)O(\epsilon^{-3})O(ϵ−3) and O(ϵ−7)O(\epsilon^{-7})O(ϵ−7) accesses to first-order (stochastic) gradient oracles when the oracle is deterministic and oracles are noisy, respectively. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, i.e., with O(1)O(1)O(1) samples per iteration, and achieves the improved oracle-complexity of O(ϵ−3)O(\epsilon^{-3})O(ϵ−3) and O(ϵ−5)O(\epsilon^{-5})O(ϵ−5), respectively.

View on arXiv
Comments on this paper