ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.14981
  4. Cited By
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with
  Latest Weight Averaging

Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging

29 September 2022
Jean Kaddour
    MoMe
    3DH
ArXivPDFHTML

Papers citing "Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging"

10 / 10 papers shown
Title
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang
Anke Tang
Didi Zhu
Zhengyu Chen
Li Shen
Fei Wu
MoMe
AAML
107
4
0
17 Oct 2024
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
MOS: Model Synergy for Test-Time Adaptation on LiDAR-Based 3D Object Detection
Zhuoxiao Chen
Junjie Meng
Mahsa Baktashmotlagh
Yonggang Zhang
Zi Huang
Yadan Luo
118
1
0
21 Jun 2024
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
114
1,894
0
29 Mar 2022
Long-Tailed Classification by Keeping the Good and Removing the Bad
  Momentum Causal Effect
Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect
Kaihua Tang
Jianqiang Huang
Hanwang Zhang
CML
88
443
0
28 Sep 2020
Are we done with ImageNet?
Are we done with ImageNet?
Lucas Beyer
Olivier J. Hénaff
Alexander Kolesnikov
Xiaohua Zhai
Aaron van den Oord
VLM
101
398
0
12 Jun 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
61
157
0
21 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
435
4,662
0
23 Jan 2020
Iterate averaging as regularization for stochastic gradient descent
Iterate averaging as regularization for stochastic gradient descent
Gergely Neu
Lorenzo Rosasco
MoMe
64
61
0
22 Feb 2018
SGDR: Stochastic Gradient Descent with Warm Restarts
SGDR: Stochastic Gradient Descent with Warm Restarts
I. Loshchilov
Frank Hutter
ODL
210
8,030
0
13 Aug 2016
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
274
43,511
0
17 Sep 2014
1