On the Convergence of SARAH and Beyond

5 June 2019

Papers citing "On the Convergence of SARAH and Beyond"

8 / 8 papers shown

Title
Variance Reduction Methods Do Not Need to Compute Full Gradients: Improved Efficiency through Shuffling Daniil Medyakov Gleb Molodtsov S. Chezhegov Alexey Rebrikov Aleksandr Beznosikov 103 0 0 21 Feb 2025
Stochastic Distributed Optimization under Average Second-order Similarity: Algorithms and Analysis Dachao Lin Yuze Han Haishan Ye Zhihua Zhang 25 11 0 15 Apr 2023
Gradient Descent-Type Methods: Background and Simple Unified Convergence Analysis Quoc Tran-Dinh Marten van Dijk 34 0 0 19 Dec 2022
Random-reshuffled SARAH does not need a full gradient computations Aleksandr Beznosikov Martin Takáč 26 7 0 26 Nov 2021
PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization Zhize Li Hongyan Bao Xiangliang Zhang Peter Richtárik ODL 31 126 0 25 Aug 2020
Variance Reduction for Deep Q-Learning using Stochastic Recursive Gradient Hao Jia Xiao Zhang Jun Xu Wei Zeng Hao Jiang Xiao Yan Ji-Rong Wen 9 3 0 25 Jul 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016
A Proximal Stochastic Gradient Method with Progressive Variance Reduction Lin Xiao Tong Zhang ODL 93 737 0 19 Mar 2014