ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.13584
  4. Cited By
Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping

Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping

27 November 2020
Jeffrey Fong
Siwei Chen
Kaiqi Chen
ArXivPDFHTML

Papers citing "Improving Layer-wise Adaptive Rate Methods using Trust Ratio Clipping"

2 / 2 papers shown
Title
The Disharmony between BN and ReLU Causes Gradient Explosion, but is
  Offset by the Correlation between Activations
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
26
0
0
23 Apr 2023
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,836
0
17 Sep 2019
1