ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.02508
25
7

Differentially Quantized Gradient Methods

6 February 2020
Chung-Yi Lin
V. Kostina
B. Hassibi
    MQ
ArXivPDFHTML
Abstract

Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of max⁡{σGD,ρn2−R}\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}max{σGD​,ρn​2−R}, where σGD\sigma_{\mathrm{GD}}σGD​ is the contraction factor of unquantized gradient descent (GD), ρn≥1\rho_n \geq 1ρn​≥1 is the covering efficiency of the quantizer, and RRR is the bitrate per problem dimension nnn. Thus at any R≥log⁡2ρn/σGDR\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}R≥log2​ρn​/σGD​ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than max⁡{σGD,2−R}\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}max{σGD​,2−R}. Since quantizers exist with ρn→1\rho_n \to 1ρn​→1 as n→∞n \to \inftyn→∞ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.

View on arXiv
Comments on this paper