Global Convergence Rate of Deep Equilibrium Models with General Activations

Abstract
In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation and proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. In this paper, we show that this fact still holds for DEQs with any general activation which has bounded first and second derivatives. Since the new activation function is generally non-linear, a general population Gram matrix is designed, and a new form of dual activation with Hermite polynomial expansion is developed.
View on arXiv@article{truong2025_2302.05797, title={ Global Convergence Rate of Deep Equilibrium Models with General Activations }, author={ Lan V. Truong }, journal={arXiv preprint arXiv:2302.05797}, year={ 2025 } }
Comments on this paper