v1v2 (latest)

SGD with Large Step Sizes Learns Sparse Features

11 October 2022

Papers citing "SGD with Large Step Sizes Learns Sparse Features"

2 / 52 papers shown

Title
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 367 19,733 0 09 Mar 2015
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation Emily L. Denton Wojciech Zaremba Joan Bruna Yann LeCun Rob Fergus FAtt 179 1,693 0 02 Apr 2014