Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.19424
Cited By
Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints
27 December 2024
Alberto Maté
Mariella Dimiccoli
AI4TS
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Temporal Context Consistency Above All: Enhancing Long-Term Anticipation by Learning and Enforcing Temporal Constraints"
19 / 19 papers shown
Title
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
117
66
0
08 Feb 2023
Stochastic Polyak Stepsize with a Moving Target
Robert Mansel Gower
Aaron Defazio
Michael G. Rabbat
73
17
0
22 Jun 2021
Adam
+
^+
+
: A Stochastic Method with Adaptive Variance Reduction
Mingrui Liu
Wei Zhang
Francesco Orabona
Tianbao Yang
48
28
0
24 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
Better Parameter-free Stochastic Optimization with ODE Updates for Coin-Betting
K. Chen
John Langford
Francesco Orabona
59
22
0
12 Jun 2020
A new regret analysis for Adam-type algorithms
Ahmet Alacaoglu
Yura Malitsky
P. Mertikopoulos
Volkan Cevher
ODL
79
43
0
21 Mar 2020
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Nicolas Loizou
Sharan Vaswani
I. Laradji
Simon Lacoste-Julien
88
188
0
24 Feb 2020
On the Variance of the Adaptive Learning Rate and Beyond
Liyuan Liu
Haoming Jiang
Pengcheng He
Weizhu Chen
Xiaodong Liu
Jianfeng Gao
Jiawei Han
ODL
300
1,909
0
08 Aug 2019
Momentum-Based Variance Reduction in Non-Convex SGD
Ashok Cutkosky
Francesco Orabona
ODL
98
410
0
24 May 2019
Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates
Sharan Vaswani
Aaron Mishkin
I. Laradji
Mark Schmidt
Gauthier Gidel
Simon Lacoste-Julien
ODL
111
210
0
24 May 2019
On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization
Dongruo Zhou
Yiqi Tang
Yuan Cao
Ziyan Yang
Quanquan Gu
86
150
0
16 Aug 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
103
193
0
18 Jun 2018
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
157
852
0
13 Aug 2017
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
862
36,910
0
25 Aug 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
362
8,005
0
23 May 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,433
0
22 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.7K
100,575
0
04 Sep 2014
Less Regret via Online Conditioning
Matthew J. Streeter
H. B. McMahan
ODL
101
66
0
25 Feb 2010
1