v1v2v3 (latest)

How much pre-training is enough to discover a good subnetwork?

31 July 2021

Anastasios Kyrillidis

ArXiv (abs)PDF HTML

Papers citing "How much pre-training is enough to discover a good subnetwork?"

50 / 64 papers shown

Title
PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs Max Zimmer Megi Andoni Christoph Spiegel Sebastian Pokutta VLM 143 10 0 23 Dec 2023
Aiming towards the minimizers: fast convergence of SGD for overparametrized problems Chaoyue Liu Dmitriy Drusvyatskiy M. Belkin Damek Davis Yi-An Ma ODL 77 18 0 05 Jun 2023
$Strong Lottery Ticket Hypothesis with $\varepsilon$--perturbation$ Strong Lottery Ticket Hypothesis with $\varepsilon$ --perturbation Zheyang Xiong Fangshuo Liao Anastasios Kyrillidis 56 2 0 29 Oct 2022
Subquadratic Overparameterization for Shallow Neural Networks Chaehwan Song Ali Ramezani-Kebrya Thomas Pethick Armin Eftekhari Volkan Cevher 76 31 0 02 Nov 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey Tailin Liang C. Glossner Lei Wang Shaobo Shi Xiaotong Zhang MQ 231 701 0 24 Jan 2021
On the Proof of Global Convergence of Gradient Descent for Deep ReLU Networks with Linear Widths Quynh N. Nguyen 122 49 0 24 Jan 2021
EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets Xiaohan Chen Yu Cheng Shuohang Wang Zhe Gan Zhangyang Wang Jingjing Liu 110 100 0 31 Dec 2020
Tight Bounds on the Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks Quynh N. Nguyen Marco Mondelli Guido Montúfar 78 83 0 21 Dec 2020
Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks Xiangyu Chang Yingcong Li Samet Oymak Christos Thrampoulidis 68 51 0 16 Dec 2020
The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Tianlong Chen Jonathan Frankle Shiyu Chang Sijia Liu Yang Zhang Michael Carbin Zhangyang Wang 68 123 0 12 Dec 2020
The Lottery Ticket Hypothesis for Object Recognition Sharath Girish Shishira R. Maiya Kamal Gupta Hao Chen L. Davis Abhinav Shrivastava 138 61 0 08 Dec 2020
Greedy Optimization Provably Wins the Lottery: Logarithmic Number of Winning Tickets is Enough Mao Ye Lemeng Wu Qiang Liu 61 17 0 29 Oct 2020
Deep Neural Network Training with Frank-Wolfe Sebastian Pokutta Christoph Spiegel Max Zimmer 68 27 0 14 Oct 2020
Gradient Flow in Sparse Neural Networks and How Lottery Tickets Win Utku Evci Yani Andrew Ioannou Cem Keskin Yann N. Dauphin 56 94 0 07 Oct 2020
Pruning Neural Networks at Initialization: Why are We Missing the Mark? Jonathan Frankle Gintare Karolina Dziugaite Daniel M. Roy Michael Carbin 67 240 0 18 Sep 2020
Logarithmic Pruning is All You Need Laurent Orseau Marcus Hutter Omar Rivasplata 87 89 0 22 Jun 2020
Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient Ankit Pensia Shashank Rajput Alliot Nagle Harit Vishwakarma Dimitris Papailiopoulos 60 104 0 14 Jun 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth Yiping Lu Chao Ma Yulong Lu Jianfeng Lu Lexing Ying MLT 153 79 0 11 Mar 2020
What is the State of Neural Network Pruning? Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle John Guttag 280 1,054 0 06 Mar 2020
Comparing Rewinding and Fine-tuning in Neural Network Pruning Alex Renda Jonathan Frankle Michael Carbin 304 388 0 05 Mar 2020
Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection Mao Ye Chengyue Gong Lizhen Nie Denny Zhou Adam R. Klivans Qiang Liu 84 111 0 03 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks Chaoyue Liu Libin Zhu M. Belkin ODL 96 265 0 29 Feb 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 153 998 0 12 Feb 2020
Proving the Lottery Ticket Hypothesis: Pruning is All You Need Eran Malach Gilad Yehudai Shai Shalev-Shwartz Ohad Shamir 112 276 0 03 Feb 2020
What's Hidden in a Randomly Weighted Neural Network? Vivek Ramanujan Mitchell Wortsman Aniruddha Kembhavi Ali Farhadi Mohammad Rastegari 66 361 0 29 Nov 2019
Rigging the Lottery: Making All Tickets Winners Utku Evci Trevor Gale Jacob Menick Pablo Samuel Castro Erich Elsen 199 607 0 25 Nov 2019
SiPPing Neural Networks: Sensitivity-informed Provable Pruning of Neural Networks Cenk Baykal Lucas Liebenwein Igor Gilitschenski Dan Feldman Daniela Rus 70 18 0 11 Oct 2019
Finite Depth and Width Corrections to the Neural Tangent Kernel Boris Hanin Mihai Nica MDE 79 152 0 13 Sep 2019
One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers Ari S. Morcos Haonan Yu Michela Paganini Yuandong Tian 79 229 0 06 Jun 2019
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask Hattie Zhou Janice Lan Rosanne Liu J. Yosinski UQCV 71 389 0 03 May 2019
The State of Sparsity in Deep Neural Networks Trevor Gale Erich Elsen Sara Hooker 165 763 0 25 Feb 2019
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit Song Mei Theodor Misiakiewicz Andrea Montanari MLT 84 279 0 16 Feb 2019
Towards moderate overparameterization: global convergence guarantees for training shallow neural networks Samet Oymak Mahdi Soltanolkotabi 61 323 0 12 Feb 2019
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks Sanjeev Arora S. Du Wei Hu Zhiyuan Li Ruosong Wang MLT 223 974 0 24 Jan 2019
Training Neural Networks with Local Error Signals Arild Nøkland L. Eidnes 105 228 0 20 Jan 2019
Greedy Layerwise Learning Can Scale to ImageNet Eugene Belilovsky Michael Eickenberg Edouard Oyallon 130 181 0 29 Dec 2018
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers Zeyuan Allen-Zhu Yuanzhi Li Yingyu Liang MLT 201 775 0 12 Nov 2018
Gradient Descent Finds Global Minima of Deep Neural Networks S. Du Jason D. Lee Haochuan Li Liwei Wang Masayoshi Tomizuka ODL 240 1,136 0 09 Nov 2018
Discrimination-aware Channel Pruning for Deep Neural Networks Zhuangwei Zhuang Mingkui Tan Bohan Zhuang Jing Liu Yong Guo Qingyao Wu Junzhou Huang Jin-Hui Zhu 134 601 0 28 Oct 2018
Rethinking the Value of Network Pruning Zhuang Liu Mingjie Sun Tinghui Zhou Gao Huang Trevor Darrell 42 1,477 0 11 Oct 2018
Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data Yuanzhi Li Yingyu Liang MLT 222 653 0 03 Aug 2018
Learning ReLU Networks via Alternating Minimization Gauri Jagatap Chinmay Hegde 40 11 0 20 Jun 2018
Learning One-hidden-layer ReLU Networks via Gradient Descent Xiao Zhang Yaodong Yu Lingxiao Wang Quanquan Gu MLT 129 135 0 20 Jun 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 277 3,225 0 20 Jun 2018
On Tighter Generalization Bound for Deep Neural Networks: CNNs, ResNets, and Beyond Xingguo Li Junwei Lu Zhaoran Wang Jarvis Haupt T. Zhao 57 80 0 13 Jun 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Andrea Montanari Phan-Minh Nguyen MLT 109 863 0 18 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks Jonathan Frankle Michael Carbin 277 3,489 0 09 Mar 2018
Learning One Convolutional Layer with Overlapping Patches Surbhi Goel Adam R. Klivans Raghu Meka MLT 80 81 0 07 Feb 2018
MobileNetV2: Inverted Residuals and Linear Bottlenecks Mark Sandler Andrew G. Howard Menglong Zhu A. Zhmoginov Liang-Chieh Chen 218 19,353 0 13 Jan 2018
NISP: Pruning Networks using Neuron Importance Score Propagation Ruichi Yu Ang Li Chun-Fu Chen Jui-Hsin Lai Vlad I. Morariu Xintong Han M. Gao Ching-Yung Lin L. Davis 74 800 0 16 Nov 2017