Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.09982
Cited By
v1
v2
v3
v4 (latest)
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
13 October 2024
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Self-Data Distillation for Recovering Quality in Pruned Large Language Models"
50 / 54 papers shown
Title
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Siyang Song
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
130
0
0
15 May 2025
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji
Yang Xiang
Juntao Li
Qingrong Xia
Ping Li
Xinyu Duan
Zhefeng Wang
Min Zhang
84
2
0
23 Oct 2024
LLM Pruning and Distillation in Practice: The Minitron Approach
Sharath Turuvekere Sreenivas
Saurav Muralidharan
Raviraj Joshi
Marcin Chochowski
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
96
36
0
21 Aug 2024
Transformer Layers as Painters
Qi Sun
Marc Pickett
Aakash Kumar Nain
Llion Jones
AI4CE
125
19
0
12 Jul 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
141
106
0
26 Mar 2024
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect
Xin Men
Mingyu Xu
Qingyu Zhang
Bingning Wang
Hongyu Lin
Yaojie Lu
Xianpei Han
Weipeng Chen
115
140
0
06 Mar 2024
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning
Zhaorui Yang
Tianyu Pang
Hao Feng
Han Wang
Wei Chen
Minfeng Zhu
Qian Liu
ALM
90
50
0
21 Feb 2024
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Shubham Toshniwal
Ivan Moshkov
Mehrzad Samadi
Daria Gitman
Fei Jia
Igor Gitman
89
97
0
15 Feb 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
158
130
0
15 Jan 2024
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
105
30
0
14 Dec 2023
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu
Yu Bowen
Haiyang Yu
Fei Huang
Yongbin Li
MoMe
118
336
0
06 Nov 2023
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia
Tianyu Gao
Zhiyuan Zeng
Danqi Chen
125
309
0
10 Oct 2023
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
Suhas Kotha
Jacob Mitchell Springer
Aditi Raghunathan
CLL
128
71
0
18 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
454
12,106
0
18 Jul 2023
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes
Rishabh Agarwal
Nino Vieillard
Yongchao Zhou
Piotr Stańczyk
Sabela Ramos
Matthieu Geist
Olivier Bachem
105
105
0
23 Jun 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
507
4,451
0
09 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
141
318
0
02 Jun 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
166
444
0
19 May 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
115
339
0
04 May 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
108
5
0
21 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
1.6K
13,520
0
27 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
92
26
0
07 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDL
LRM
89
436
0
02 Feb 2023
Progressive Prompts: Continual Learning for Language Models
Anastasia Razdaibiedina
Yuning Mao
Rui Hou
Madian Khabsa
M. Lewis
Amjad Almahairi
VLM
KELM
CLL
121
142
0
29 Jan 2023
Self-Instruct: Aligning Language Models with Self-Generated Instructions
Yizhong Wang
Yeganeh Kordi
Swaroop Mishra
Alisa Liu
Noah A. Smith
Daniel Khashabi
Hannaneh Hajishirzi
ALM
SyDa
LRM
186
2,259
0
20 Dec 2022
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
151
738
0
30 Nov 2022
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLM
LRM
173
638
0
07 Oct 2022
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
O. Ostapenko
Timothée Lesort
P. Rodríguez
Md Rifat Arefin
Arthur Douillard
Irina Rish
Laurent Charlin
96
53
0
30 Apr 2022
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
111
90
0
30 Mar 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
213
1,989
0
29 Mar 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
152
511
0
28 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
910
13,249
0
04 Mar 2022
Controlling Conditional Language Models without Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
AI4CE
104
35
0
01 Dec 2021
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
392
4,604
0
27 Oct 2021
Continual Learning for Text Classification with Information Disentanglement Based Regularization
Yufan Huang
Yanzhe Zhang
Jiaao Chen
Xuezhi Wang
Diyi Yang
CLL
66
113
0
12 Apr 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLM
FaML
216
2,410
0
05 Mar 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
115
274
0
31 Dec 2020
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
197
4,580
0
07 Sep 2020
Dense Passage Retrieval for Open-Domain Question Answering
Vladimir Karpukhin
Barlas Oğuz
Sewon Min
Patrick Lewis
Ledell Yu Wu
Sergey Edunov
Danqi Chen
Wen-tau Yih
RALM
222
3,804
0
10 Apr 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
663
4,932
0
23 Jan 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
609
42,753
0
03 Dec 2019
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
163
478
0
06 Nov 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
117
1,877
0
23 Sep 2019
LAMOL: LAnguage MOdeling for Lifelong Language Learning
Fan-Keng Sun
Cheng-Hao Ho
Hung-yi Lee
CLL
KELM
97
213
0
07 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,343
0
27 Aug 2019
Uniform convergence may be unable to explain generalization in deep learning
Vaishnavh Nagarajan
J. Zico Kolter
MoMe
AI4CE
98
317
0
13 Feb 2019
Experience Replay for Continual Learning
David Rolnick
Arun Ahuja
Jonathan Richard Schwarz
Timothy Lillicrap
Greg Wayne
CLL
121
1,174
0
28 Nov 2018
Blockwise Parallel Decoding for Deep Autoregressive Models
Mitchell Stern
Noam M. Shazeer
Ashley J. Llorens
83
238
0
07 Nov 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
237
2,675
0
14 Mar 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
847
132,963
0
12 Jun 2017
1
2
Next