Multi-Level Composite Stochastic Optimization via Nested Variance Reduction

We consider multi-level composite optimization problems where each mapping in the composition is the expectation over a family of random smooth mappings or the sum of some finite number of smooth mappings. We present a normalized proximal approximate gradient (NPAG) method where the approximate gradients are obtained via nested stochastic variance reduction. In order to find an approximate stationary point where the expected norm of its gradient mapping is less than , the total sample complexity of our method is in the expectation case, and in the finite-sum case where is the total number of functions across all composition levels. In addition, the dependence of our total sample complexity on the number of composition levels is polynomial, rather than exponential as in previous work.
View on arXiv