ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.12065
  4. Cited By
On the Convergence Rate of Training Recurrent Neural Networks

On the Convergence Rate of Training Recurrent Neural Networks

29 October 2018
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao Song
ArXivPDFHTML

Papers citing "On the Convergence Rate of Training Recurrent Neural Networks"

50 / 128 papers shown
Title
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
73
3
0
03 Mar 2025
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin
Ying Li
Mingyu Zhao
Shiyu Zhao
Zhenting Wang
Xiaoxiao He
Ligong Han
Tong Che
Dimitris N. Metaxas
VPVLM
VLM
124
1
0
02 Feb 2025
Generalization and Risk Bounds for Recurrent Neural Networks
Generalization and Risk Bounds for Recurrent Neural Networks
Xuewei Cheng
Ke Huang
Shujie Ma
26
1
0
05 Nov 2024
SHAP values via sparse Fourier representation
SHAP values via sparse Fourier representation
Ali Gorji
Andisheh Amrollahi
A. Krause
FAtt
38
0
0
08 Oct 2024
Stochastic Gradient Descent for Two-layer Neural Networks
Stochastic Gradient Descent for Two-layer Neural Networks
Dinghao Cao
Zheng-Chu Guo
Lei Shi
MLT
24
0
0
10 Jul 2024
Evaluating the design space of diffusion-based generative models
Evaluating the design space of diffusion-based generative models
Yuqing Wang
Ye He
Molei Tao
DiffM
38
5
0
18 Jun 2024
Recurrent Natural Policy Gradient for POMDPs
Recurrent Natural Policy Gradient for POMDPs
Semih Cayci
A. Eryilmaz
32
0
0
28 May 2024
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and
  Low-Frequency Information of Parametric Models
HiLo: Detailed and Robust 3D Clothed Human Reconstruction with High-and Low-Frequency Information of Parametric Models
Yifan Yang
Dong Liu
Shuhai Zhang
Zeshuai Deng
Zixiong Huang
Mingkui Tan
3DH
29
8
0
07 Apr 2024
CAM-Based Methods Can See through Walls
CAM-Based Methods Can See through Walls
Magamed Taimeskhanov
R. Sicre
Damien Garreau
21
1
0
02 Apr 2024
Convergence of Gradient Descent for Recurrent Neural Networks: A
  Nonasymptotic Analysis
Convergence of Gradient Descent for Recurrent Neural Networks: A Nonasymptotic Analysis
Semih Cayci
A. Eryilmaz
26
3
0
19 Feb 2024
LoRA Training in the NTK Regime has No Spurious Local Minima
LoRA Training in the NTK Regime has No Spurious Local Minima
Uijeong Jang
Jason D. Lee
Ernest K. Ryu
44
14
0
19 Feb 2024
Convergence Analysis for Learning Orthonormal Deep Linear Neural
  Networks
Convergence Analysis for Learning Orthonormal Deep Linear Neural Networks
Zhen Qin
Xuwei Tan
Zhihui Zhu
34
0
0
24 Nov 2023
On the Convergence of Encoder-only Shallow Transformers
On the Convergence of Encoder-only Shallow Transformers
Yongtao Wu
Fanghui Liu
Grigorios G. Chrysos
V. Cevher
47
5
0
02 Nov 2023
An Automatic Learning Rate Schedule Algorithm for Achieving Faster
  Convergence and Steeper Descent
An Automatic Learning Rate Schedule Algorithm for Achieving Faster Convergence and Steeper Descent
Zhao Song
Chiwun Yang
29
9
0
17 Oct 2023
How many Neurons do we need? A refined Analysis for Shallow Networks
  trained with Gradient Descent
How many Neurons do we need? A refined Analysis for Shallow Networks trained with Gradient Descent
Mike Nguyen
Nicole Mücke
MLT
27
5
0
14 Sep 2023
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph
  Neural Network?
Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?
Lianke Qin
Zhao Song
Baocheng Sun
23
7
0
14 Sep 2023
A Fast Optimization View: Reformulating Single Layer Attention in LLM
  Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time
Yeqi Gao
Zhao Song
Weixin Wang
Junze Yin
22
25
0
14 Sep 2023
Six Lectures on Linearized Neural Networks
Six Lectures on Linearized Neural Networks
Theodor Misiakiewicz
Andrea Montanari
42
12
0
25 Aug 2023
How to Protect Copyright Data in Optimization of Large Language Models?
How to Protect Copyright Data in Optimization of Large Language Models?
T. Chu
Zhao Song
Chiwun Yang
40
29
0
23 Aug 2023
Convergence of Two-Layer Regression with Nonlinear Units
Convergence of Two-Layer Regression with Nonlinear Units
Yichuan Deng
Zhao Song
Shenghao Xie
29
7
0
16 Aug 2023
Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks
Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks
Ran Dou
José C. Príncipe
33
2
0
28 Jul 2023
Equitable Time-Varying Pricing Tariff Design: A Joint Learning and
  Optimization Approach
Equitable Time-Varying Pricing Tariff Design: A Joint Learning and Optimization Approach
Liudong Chen
Bolun Xu
18
0
0
26 Jul 2023
Efficient SGD Neural Network Training via Sublinear Activated Neuron
  Identification
Efficient SGD Neural Network Training via Sublinear Activated Neuron Identification
Lianke Qin
Zhao Song
Yuanyuan Yang
25
9
0
13 Jul 2023
Efficient Uncertainty Quantification and Reduction for
  Over-Parameterized Neural Networks
Efficient Uncertainty Quantification and Reduction for Over-Parameterized Neural Networks
Ziyi Huang
H. Lam
Haofeng Zhang
UQCV
26
4
0
09 Jun 2023
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
  Language Understanding
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu
Tong Yu
Rui Wang
Zhao Song
Ruiyi Zhang
Handong Zhao
Chaochao Lu
Shuai Li
Ricardo Henao
VLM
39
23
0
08 Jun 2023
Query Complexity of Active Learning for Function Family With Nearly
  Orthogonal Basis
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Xiangyi Chen
Zhao Song
Baochen Sun
Junze Yin
Danyang Zhuo
42
3
0
06 Jun 2023
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree
  Spectral Bias of Neural Networks
A Scalable Walsh-Hadamard Regularizer to Overcome the Low-degree Spectral Bias of Neural Networks
Ali Gorji
Andisheh Amrollahi
A. Krause
16
4
0
16 May 2023
Efficient Asynchronize Stochastic Gradient Algorithm with Structured
  Data
Efficient Asynchronize Stochastic Gradient Algorithm with Structured Data
Zhao Song
Mingquan Ye
27
4
0
13 May 2023
On the Eigenvalue Decay Rates of a Class of Neural-Network Related
  Kernel Functions Defined on General Domains
On the Eigenvalue Decay Rates of a Class of Neural-Network Related Kernel Functions Defined on General Domains
Yicheng Li
Zixiong Yu
Y. Cotronis
Qian Lin
55
13
0
04 May 2023
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
An Iterative Algorithm for Rescaled Hyperbolic Functions Regression
Yeqi Gao
Zhao Song
Junze Yin
31
33
0
01 May 2023
Attention Scheme Inspired Softmax Regression
Attention Scheme Inspired Softmax Regression
Yichuan Deng
Zhihang Li
Zhao Song
44
42
0
20 Apr 2023
An Over-parameterized Exponential Regression
An Over-parameterized Exponential Regression
Yeqi Gao
Sridhar Mahadevan
Zhao Song
16
36
0
29 Mar 2023
A Brief Survey on the Approximation Theory for Sequence Modelling
A Brief Survey on the Approximation Theory for Sequence Modelling
Hao Jiang
Qianxiao Li
Zhong Li
Shida Wang
AI4TS
30
12
0
27 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
Bypass Exponential Time Preprocessing: Fast Neural Network Training via
  Weight-Data Correlation Preprocessing
Bypass Exponential Time Preprocessing: Fast Neural Network Training via Weight-Data Correlation Preprocessing
Josh Alman
Jiehao Liang
Zhao Song
Ruizhe Zhang
Danyang Zhuo
77
31
0
25 Nov 2022
Linear RNNs Provably Learn Linear Dynamic Systems
Linear RNNs Provably Learn Linear Dynamic Systems
Lifu Wang
Tianyu Wang
Shengwei Yi
Bo Shen
Bo Hu
Xing Cao
17
0
0
19 Nov 2022
Learning Low Dimensional State Spaces with Overparameterized Recurrent
  Neural Nets
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets
Edo Cohen-Karlik
Itamar Menuhin-Gruman
Raja Giryes
Nadav Cohen
Amir Globerson
27
4
0
25 Oct 2022
Global Convergence of SGD On Two Layer Neural Nets
Global Convergence of SGD On Two Layer Neural Nets
Pulkit Gopalani
Anirbit Mukherjee
26
5
0
20 Oct 2022
On Scrambling Phenomena for Randomly Initialized Recurrent Networks
On Scrambling Phenomena for Randomly Initialized Recurrent Networks
Vaggos Chatziafratis
Ioannis Panageas
Clayton Sanford
S. Stavroulakis
30
2
0
11 Oct 2022
A Sublinear Adversarial Training Algorithm
A Sublinear Adversarial Training Algorithm
Yeqi Gao
Lianke Qin
Zhao Song
Yitan Wang
GAN
33
25
0
10 Aug 2022
Training Overparametrized Neural Networks in Sublinear Time
Training Overparametrized Neural Networks in Sublinear Time
Yichuan Deng
Han Hu
Zhao Song
Omri Weinstein
Danyang Zhuo
BDL
30
28
0
09 Aug 2022
Federated Adversarial Learning: A Framework with Convergence Analysis
Federated Adversarial Learning: A Framework with Convergence Analysis
Xiaoxiao Li
Zhao Song
Jiaming Yang
FedML
27
19
0
07 Aug 2022
Bounding the Width of Neural Networks via Coupled Initialization -- A
  Worst Case Analysis
Bounding the Width of Neural Networks via Coupled Initialization -- A Worst Case Analysis
Alexander Munteanu
Simon Omlor
Zhao Song
David P. Woodruff
33
15
0
26 Jun 2022
Global Convergence of Over-parameterized Deep Equilibrium Models
Global Convergence of Over-parameterized Deep Equilibrium Models
Zenan Ling
Xingyu Xie
Qiuhao Wang
Zongpeng Zhang
Zhouchen Lin
32
12
0
27 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised
  Learning
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
27
34
0
12 May 2022
Real-time Forecasting of Time Series in Financial Markets Using
  Sequentially Trained Many-to-one LSTMs
Real-time Forecasting of Time Series in Financial Markets Using Sequentially Trained Many-to-one LSTMs
Kelum Gajamannage
Yonggi Park
AI4TS
AIFin
11
4
0
10 May 2022
Spectrum of inner-product kernel matrices in the polynomial regime and
  multiple descent phenomenon in kernel ridge regression
Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression
Theodor Misiakiewicz
11
39
0
21 Apr 2022
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural
  Networks
Implicit Bias of MSE Gradient Optimization in Underparameterized Neural Networks
Benjamin Bowman
Guido Montúfar
28
11
0
12 Jan 2022
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic
  Time
Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time
Zhao Song
Licheng Zhang
Ruizhe Zhang
32
64
0
14 Dec 2021
Fast Graph Neural Tangent Kernel via Kronecker Sketching
Fast Graph Neural Tangent Kernel via Kronecker Sketching
Shunhua Jiang
Yunze Man
Zhao Song
Zheng Yu
Danyang Zhuo
29
6
0
04 Dec 2021
123
Next