ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.09152
  4. Cited By
Weight Distillation: Transferring the Knowledge in Neural Network
  Parameters

Weight Distillation: Transferring the Knowledge in Neural Network Parameters

19 September 2020
Ye Lin
Yanyang Li
Ziyang Wang
Bei Li
Quan Du
Tong Xiao
Jingbo Zhu
ArXivPDFHTML

Papers citing "Weight Distillation: Transferring the Knowledge in Neural Network Parameters"

30 / 30 papers shown
Title
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
101
3
0
08 Aug 2023
The NiuTrans Machine Translation Systems for WMT21
The NiuTrans Machine Translation Systems for WMT21
Yuhao Zhang
Tao Zhou
Bin Wei
Runzhe Cao
Yongyu Mu
...
Weiqiao Shan
Yinqiao Li
Bei Li
Tong Xiao
Jingbo Zhu
49
17
0
22 Sep 2021
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
134
40
0
27 Dec 2020
Towards Fully 8-bit Integer Inference for the Transformer Model
Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
MQ
35
62
0
17 Sep 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language
  Processing
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang
Zhanghao Wu
Zhijian Liu
Han Cai
Ligeng Zhu
Chuang Gan
Song Han
77
259
0
28 May 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
82
356
0
05 Apr 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
85
150
0
26 Feb 2020
Neural Machine Translation with Joint Representation
Neural Machine Translation with Joint Representation
Yanyang Li
Qiang Wang
Tong Xiao
Tongran Liu
Jingbo Zhu
24
9
0
16 Feb 2020
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation
  for Neural Machine Translation
Explaining Sequence-Level Knowledge Distillation as Data-Augmentation for Neural Machine Translation
Mitchell A. Gordon
Kevin Duh
41
28
0
06 Dec 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
162
7,437
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
75
1,847
0
23 Sep 2019
Self-Knowledge Distillation in Natural Language Processing
Self-Knowledge Distillation in Natural Language Processing
Sangchul Hahn
Heeyoul Choi
60
111
0
02 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
467
24,160
0
26 Jul 2019
XLNet: Generalized Autoregressive Pretraining for Language Understanding
XLNet: Generalized Autoregressive Pretraining for Language Understanding
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
199
8,386
0
19 Jun 2019
Learning Deep Transformer Models for Machine Translation
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
67
666
0
05 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
84
3,141
0
01 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.2K
93,936
0
11 Oct 2018
Contextual Parameter Generation for Universal Neural Machine Translation
Contextual Parameter Generation for Universal Neural Machine Translation
Emmanouil Antonios Platanios
Mrinmaya Sachan
Graham Neubig
Tom Michael Mitchell
46
149
0
26 Aug 2018
Training Deeper Neural Machine Translation Models with Transparent
  Attention
Training Deeper Neural Machine Translation Models with Transparent Attention
Ankur Bapna
Mengzhao Chen
Orhan Firat
Yuan Cao
Yonghui Wu
57
139
0
22 Aug 2018
Born Again Neural Networks
Born Again Neural Networks
Tommaso Furlanello
Zachary Chase Lipton
Michael Tschannen
Laurent Itti
Anima Anandkumar
68
1,030
0
12 May 2018
Accelerating Neural Transformer via an Average Attention Network
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang
Deyi Xiong
Jinsong Su
55
120
0
02 May 2018
Deep contextualized word representations
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
141
11,520
0
15 Feb 2018
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
143
1,779
0
10 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
519
129,831
0
12 Jun 2017
Ensemble Distillation for Neural Machine Translation
Ensemble Distillation for Neural Machine Translation
Markus Freitag
Yaser Al-Onaizan
B. Sankaran
FedML
45
111
0
06 Feb 2017
Sequence-Level Knowledge Distillation
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
95
1,109
0
25 Jun 2016
All you need is a good init
All you need is a good init
Dmytro Mishkin
Jirí Matas
ODL
79
608
0
19 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
174
7,683
0
31 Aug 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
286
19,523
0
09 Mar 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
256
3,862
0
19 Dec 2014
1