Estimating or Propagating Gradients Through Stochastic Neurons for
  Conditional Computation

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

Papers citing "Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation"

50 / 1,519 papers shown
Title
Accelerating Transformer Pre-training with 2:4 Sparsity
Accelerating Transformer Pre-training with 2:4 Sparsity
Yuezhou Hu
Kang Zhao
Weiyu Huang
Jianfei Chen
Jun Zhu
138
9
0
02 Apr 2024

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.