ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.03594
  4. Cited By
Enabling High-Sparsity Foundational Llama Models with Efficient
  Pretraining and Deployment

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

6 May 2024
Abhinav Agarwalla
Abhay Gupta
Alexandre Marques
Shubhra Pandit
Michael Goin
Eldar Kurtic
Kevin Leong
Tuan Nguyen
Mahmoud Salem
Dan Alistarh
Sean Lie
Mark Kurtz
    MoESyDa
ArXiv (abs)PDFHTML

Papers citing "Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment"

13 / 13 papers shown
Title
Sparse-to-Sparse Training of Diffusion Models
Sparse-to-Sparse Training of Diffusion Models
Inês Cardoso Oliveira
Decebal Constantin Mocanu
Luis A. Leiva
DiffM
155
0
0
30 Apr 2025
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
PLM: Efficient Peripheral Language Models Hardware-Co-Designed for Ubiquitous Computing
Cheng Deng
Luoyang Sun
Jiwen Jiang
Yongcheng Zeng
Xinjian Wu
...
Haoyang Li
Lei Chen
Lionel M. Ni
Jun Wang
Jun Wang
411
0
0
15 Mar 2025
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
167
19
0
06 Oct 2024
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for
  Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
128
102
0
08 Oct 2023
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language
  Model
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model
A. Luccioni
S. Viguier
Anne-Laure Ligozat
101
290
0
03 Nov 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
119
664
0
15 Aug 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for
  Large-Scale Transformers
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLMMQ
142
479
0
04 Jun 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
  Large Language Models
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLMMQMedIm
111
126
0
14 Mar 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
367
4,598
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
238
5,675
0
07 Jul 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
323
728
0
31 Jan 2021
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
119
1,149
0
23 May 2019
Better Summarization Evaluation with Word Embeddings for ROUGE
Better Summarization Evaluation with Word Embeddings for ROUGE
Jun-Ping Ng
Viktoria Abrecht
63
172
0
25 Aug 2015
1