ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.13813
  4. Cited By
How to Scale Your EMA

How to Scale Your EMA

25 July 2023
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
ArXivPDFHTML

Papers citing "How to Scale Your EMA"

22 / 22 papers shown
Title
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference
Yuxuan Tian
Zihan Wang
Yebo Peng
Aomufei Yuan
Z. Wang
Bairen Yi
Xin Liu
Yong Cui
Tong Yang
32
0
0
14 Apr 2025
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Straight to Zero: Why Linearly Decaying the Learning Rate to Zero Works Best for LLMs
Shane Bergsma
Nolan Dey
Gurpreet Gosal
Gavia Gray
Daria Soboleva
Joel Hestness
50
5
0
21 Feb 2025
Efficient Federated Learning against Heterogeneous and Non-stationary
  Client Unavailability
Efficient Federated Learning against Heterogeneous and Non-stationary Client Unavailability
Ming Xiang
Stratis Ioannidis
Edmund Yeh
Carlee Joe-Wong
Lili Su
FedML
29
5
0
26 Sep 2024
On Scaling Up 3D Gaussian Splatting Training
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao
Haoyang Weng
Daohan Lu
Ang Li
Jinyang Li
Aurojit Panda
Saining Xie
3DGS
29
12
0
26 Jun 2024
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
D2O: Dynamic Discriminative Operations for Efficient Long-Context Inference of Large Language Models
Zhongwei Wan
Xinjian Wu
Yu Zhang
Yi Xin
Chaofan Tao
...
Xin Wang
Siqi Luo
Jing Xiong
Mi Zhang
Mi Zhang
29
0
0
18 Jun 2024
How to set AdamW's weight decay as you scale model and dataset size
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang
Laurence Aitchison
38
9
0
22 May 2024
BOSC: A Backdoor-based Framework for Open Set Synthetic Image
  Attribution
BOSC: A Backdoor-based Framework for Open Set Synthetic Image Attribution
Jun Wang
B. Tondi
Mauro Barni
DiffM
35
1
0
19 May 2024
Greedy-DiM: Greedy Algorithms for Unreasonably Effective Face Morphs
Greedy-DiM: Greedy Algorithms for Unreasonably Effective Face Morphs
Zander Blasingame
Chen Liu
30
6
0
09 Apr 2024
Bootstrap Your Own Variance
Bootstrap Your Own Variance
Polina Turishcheva
Jason Ramapuram
Sinead Williamson
Dan Busbridge
Eeshan Gunesh Dhekane
Russ Webb
UQCV
21
0
0
06 Dec 2023
Nash Learning from Human Feedback
Nash Learning from Human Feedback
Rémi Munos
Michal Valko
Daniele Calandriello
M. G. Azar
Mark Rowland
...
Nikola Momchev
Olivier Bachem
D. Mankowitz
Doina Precup
Bilal Piot
31
125
0
01 Dec 2023
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning
  and Autoregression
Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning and Autoregression
Adam Block
Dylan J. Foster
Akshay Krishnamurthy
Max Simchowitz
Cyril Zhang
30
4
0
17 Oct 2023
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
  Pre-training Methods
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
33
9
0
30 Sep 2022
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
On the SDEs and Scaling Rules for Adaptive Gradient Algorithms
Sadhika Malladi
Kaifeng Lyu
A. Panigrahi
Sanjeev Arora
92
40
0
20 May 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,434
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,775
0
29 Apr 2021
Understanding self-supervised Learning Dynamics without Contrastive
  Pairs
Understanding self-supervised Learning Dynamics without Contrastive Pairs
Yuandong Tian
Xinlei Chen
Surya Ganguli
SSL
138
279
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
BYOL works even without batch statistics
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
250
114
0
20 Oct 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
234
4,469
0
23 Jan 2020
Revisiting Self-Training for Neural Sequence Generation
Revisiting Self-Training for Neural Sequence Generation
Junxian He
Jiatao Gu
Jiajun Shen
MarcÁurelio Ranzato
SSL
LRM
244
269
0
30 Sep 2019
Mean teachers are better role models: Weight-averaged consistency
  targets improve semi-supervised deep learning results
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
Antti Tarvainen
Harri Valpola
OOD
MoMe
261
1,275
0
06 Mar 2017
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,194
0
01 Sep 2014
1