ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.20194
  4. Cited By
Occam Gradient Descent

Occam Gradient Descent

30 May 2024
B. N. Kausik
    ODL
    VLM
ArXivPDFHTML

Papers citing "Occam Gradient Descent"

23 / 23 papers shown
Title
Scaling Efficient LLMs
Scaling Efficient LLMs
B. N. Kausik
56
3
0
08 Jan 2025
Neural Network Pruning by Gradient Descent
Neural Network Pruning by Gradient Descent
Zhang Zhang
Ruyi Tao
Jiang Zhang
37
4
0
21 Nov 2023
A Simple and Effective Pruning Approach for Large Language Models
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
97
389
0
20 Jun 2023
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis
Fuzhao Xue
Yao Fu
Wangchunshu Zhou
Zangwei Zheng
Yang You
96
81
0
22 May 2023
Neural Architecture Search: Insights from 1000 Papers
Neural Architecture Search: Insights from 1000 Papers
Colin White
Mahmoud Safari
R. Sukthanker
Binxin Ru
T. Elsken
Arber Zela
Debadeepta Dey
Frank Hutter
3DV
AI4CE
52
133
0
20 Jan 2023
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
Elias Frantar
Dan Alistarh
VLM
67
677
0
02 Jan 2023
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
90
1,894
0
29 Mar 2022
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A
  Large-Scale Generative Language Model
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
Shaden Smith
M. Patwary
Brandon Norick
P. LeGresley
Samyam Rajbhandari
...
Mohammad Shoeybi
Yuxiong He
Michael Houston
Saurabh Tiwary
Bryan Catanzaro
MoE
110
737
0
28 Jan 2022
LaMDA: Language Models for Dialog Applications
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
81
1,577
0
20 Jan 2022
Scaling Language Models: Methods, Analysis & Insights from Training
  Gopher
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
Jack W. Rae
Sebastian Borgeaud
Trevor Cai
Katie Millican
Jordan Hoffmann
...
Jeff Stanway
L. Bennett
Demis Hassabis
Koray Kavukcuoglu
G. Irving
31
1,303
0
08 Dec 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
195
703
0
31 Jan 2021
Provable Benefits of Overparameterization in Model Compression: From
  Double Descent to Pruning Neural Networks
Provable Benefits of Overparameterization in Model Compression: From Double Descent to Pruning Neural Networks
Xiangyu Chang
Yingcong Li
Samet Oymak
Christos Thrampoulidis
47
50
0
16 Dec 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
321
41,106
0
28 May 2020
What is the State of Neural Network Pruning?
What is the State of Neural Network Pruning?
Davis W. Blalock
Jose Javier Gonzalez Ortiz
Jonathan Frankle
John Guttag
225
1,040
0
06 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
340
4,662
0
23 Jan 2020
Rethinking the Value of Network Pruning
Rethinking the Value of Network Pruning
Zhuang Liu
Mingjie Sun
Tinghui Zhou
Gao Huang
Trevor Darrell
21
1,460
0
11 Oct 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
111
3,433
0
09 Mar 2018
Regularization for Deep Learning: A Taxonomy
Regularization for Deep Learning: A Taxonomy
J. Kukačka
Vladimir Golkov
Daniel Cremers
55
335
0
29 Oct 2017
Scalable Training of Artificial Neural Networks with Adaptive Sparse
  Connectivity inspired by Network Science
Scalable Training of Artificial Neural Networks with Adaptive Sparse Connectivity inspired by Network Science
Decebal Constantin Mocanu
Elena Mocanu
Peter Stone
Phuong H. Nguyen
M. Gibescu
A. Liotta
60
619
0
15 Jul 2017
A Random Forest Guided Tour
A Random Forest Guided Tour
Gérard Biau
Erwan Scornet
AI4TS
165
2,768
0
18 Nov 2015
Learning both Weights and Connections for Efficient Neural Networks
Learning both Weights and Connections for Efficient Neural Networks
Song Han
Jeff Pool
J. Tran
W. Dally
CVBM
145
6,628
0
08 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
69
19,448
0
09 Mar 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
262
149,474
0
22 Dec 2014
1