ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.11215
  4. Cited By
AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods

17 February 2024
Tim Tsz-Kit Lau
Han Liu
Mladen Kolar
    ODL
ArXivPDFHTML

Papers citing "AdAdaGrad: Adaptive Batch Size Schemes for Adaptive Gradient Methods"

10 / 10 papers shown
Title
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Adaptive Batch Size Schedules for Distributed Training of Language Models with Data and Model Parallelism
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
156
0
0
30 Dec 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed
  Local Gradient Methods
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
44
1
0
20 Jun 2024
Enhancing Efficiency of Safe Reinforcement Learning via Sample
  Manipulation
Enhancing Efficiency of Safe Reinforcement Learning via Sample Manipulation
Shangding Gu
Laixi Shi
Yuhao Ding
Alois Knoll
C. Spanos
Adam Wierman
Ming Jin
OffRL
40
2
0
31 May 2024
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on
  Transformers, but Sign Descent Might Be
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner
Jacques Chen
J. Lavington
Mark W. Schmidt
40
67
0
27 Apr 2023
Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic
  Optimization
Adaptive Sampling Quasi-Newton Methods for Zeroth-Order Stochastic Optimization
Raghu Bollapragada
Stefan M. Wild
32
11
0
24 Sep 2021
A High Probability Analysis of Adaptive SGD with Momentum
A High Probability Analysis of Adaptive SGD with Momentum
Xiaoyun Li
Francesco Orabona
92
65
0
28 Jul 2020
A Simple Convergence Proof of Adam and Adagrad
A Simple Convergence Proof of Adam and Adagrad
Alexandre Défossez
Léon Bottou
Francis R. Bach
Nicolas Usunier
56
143
0
05 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
1