ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.09053
  4. Cited By
Towards a theory of model distillation

Towards a theory of model distillation

14 March 2024
Enric Boix-Adserà
    FedML
    VLM
ArXivPDFHTML

Papers citing "Towards a theory of model distillation"

20 / 20 papers shown
Title
Laughing Hyena Distillery: Extracting Compact Recurrences From
  Convolutions
Laughing Hyena Distillery: Extracting Compact Recurrences From Convolutions
Stefano Massaroli
Michael Poli
Daniel Y. Fu
Hermann Kumbong
Rom N. Parnichkun
...
Atri Rudra
Ce Zhang
Christopher Ré
Stefano Ermon
Yoshua Bengio
52
20
0
28 Oct 2023
Properly Learning Decision Trees with Queries Is NP-Hard
Properly Learning Decision Trees with Queries Is NP-Hard
Caleb M. Koch
Carmen Strassle
Li-Yang Tan
37
6
0
09 Jul 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
60
107
0
06 Jun 2023
Toy Models of Superposition
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
160
351
0
21 Sep 2022
On the non-universality of deep learning: quantifying the cost of
  symmetry
On the non-universality of deep learning: quantifying the cost of symmetry
Emmanuel Abbe
Enric Boix-Adserà
FedML
MLT
47
18
0
05 Aug 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
67
128
0
18 Jul 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
73
122
0
03 May 2022
Inspecting the concept knowledge graph encoded by modern language models
Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
56
13
0
27 May 2021
Towards Understanding Knowledge Distillation
Towards Understanding Knowledge Distillation
Mary Phuong
Christoph H. Lampert
58
314
0
27 May 2021
Knowledge Neurons in Pretrained Transformers
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
66
440
0
18 Apr 2021
Interpretable Machine Learning: Fundamental Principles and 10 Grand
  Challenges
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Cynthia Rudin
Chaofan Chen
Zhi Chen
Haiyang Huang
Lesia Semenova
Chudi Zhong
FaML
AI4CE
LRM
134
662
0
20 Mar 2021
Towards Understanding Ensemble, Knowledge Distillation and
  Self-Distillation in Deep Learning
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning
Zeyuan Allen-Zhu
Yuanzhi Li
FedML
114
362
0
17 Dec 2020
Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
Exploring the Linear Subspace Hypothesis in Gender Bias Mitigation
Francisco Vargas
Ryan Cotterell
55
29
0
20 Sep 2020
What is the State of Neural Network Pruning?
What is the State of Neural Network Pruning?
Davis W. Blalock
Jose Javier Gonzalez Ortiz
Jonathan Frankle
John Guttag
252
1,045
0
06 Mar 2020
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
95
1,051
0
25 May 2019
What you can cram into a single vector: Probing sentence embeddings for
  linguistic properties
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
299
888
0
03 May 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
187
3,433
0
09 Mar 2018
To prune, or not to prune: exploring the efficacy of pruning for model
  compression
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
157
1,262
0
05 Oct 2017
Interpretability via Model Extraction
Interpretability via Model Extraction
Osbert Bastani
Carolyn Kim
Hamsa Bastani
FAtt
47
129
0
29 Jun 2017
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
FaML
798
16,828
0
16 Feb 2016
1