ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.08096
  4. Cited By
Towards Understanding and Improving Knowledge Distillation for Neural
  Machine Translation

Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation

14 May 2023
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
ArXivPDFHTML

Papers citing "Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation"

40 / 40 papers shown
Title
Scheduled Multi-task Learning for Neural Chat Translation
Scheduled Multi-task Learning for Neural Chat Translation
Yunlong Liang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
35
12
0
08 May 2022
Why Exposure Bias Matters: An Imitation Learning Perspective of Error
  Accumulation in Language Generation
Why Exposure Bias Matters: An Imitation Learning Perspective of Error Accumulation in Language Generation
Kushal Arora
Layla El Asri
Hareesh Bahuleyan
Jackie C.K. Cheung
52
81
0
03 Apr 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
103
162
0
01 Mar 2022
Confidence Based Bidirectional Global Context Aware Training Framework
  for Neural Machine Translation
Confidence Based Bidirectional Global Context Aware Training Framework for Neural Machine Translation
Chulun Zhou
Fandong Meng
Jie Zhou
Hao Fei
Hongji Wang
Jinsong Su
26
15
0
28 Feb 2022
Towards Making the Most of Dialogue Characteristics for Neural Chat
  Translation
Towards Making the Most of Dialogue Characteristics for Neural Chat Translation
Yunlong Liang
Chulun Zhou
Fandong Meng
Jinan Xu
Jinan Xu
Jinsong Su
Jie Zhou
42
19
0
02 Sep 2021
Scheduled Sampling Based on Decoding Steps for Neural Machine
  Translation
Scheduled Sampling Based on Decoding Steps for Neural Machine Translation
Yijin Liu
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
51
16
0
30 Aug 2021
Modeling Bilingual Conversational Characteristics for Neural Chat
  Translation
Modeling Bilingual Conversational Characteristics for Neural Chat Translation
Yunlong Liang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
30
28
0
23 Jul 2021
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
55
14
0
22 Jul 2021
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation
Guiding Teacher Forcing with Seer Forcing for Neural Machine Translation
Yang Feng
Shuhao Gu
Dengji Guo
Zhengxin Yang
Chenze Shao
33
13
0
12 Jun 2021
Selective Knowledge Distillation for Neural Machine Translation
Selective Knowledge Distillation for Neural Machine Translation
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
41
60
0
27 May 2021
Annealing Knowledge Distillation
Annealing Knowledge Distillation
A. Jafari
Mehdi Rezagholizadeh
Pranav Sharma
A. Ghodsi
45
79
0
14 Apr 2021
Shallow-to-Deep Training for Neural Machine Translation
Shallow-to-Deep Training for Neural Machine Translation
Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
37
49
0
08 Oct 2020
TeaForN: Teacher-Forcing with N-grams
TeaForN: Teacher-Forcing with N-grams
Sebastian Goodman
Nan Ding
Radu Soricut
46
19
0
07 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique
  for Intermediate Layers
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
43
34
0
06 Oct 2020
COMET: A Neural Framework for MT Evaluation
COMET: A Neural Framework for MT Evaluation
Ricardo Rei
Craig Alan Stewart
Ana C. Farinha
A. Lavie
104
1,090
0
18 Sep 2020
Language Model Prior for Low-Resource Neural Machine Translation
Language Model Prior for Low-Resource Neural Machine Translation
Christos Baziotis
Barry Haddow
Alexandra Birch
38
53
0
30 Apr 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Multiscale Collaborative Deep Models for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
44
28
0
29 Apr 2020
Knowledge Distillation for Multilingual Unsupervised Neural Machine
  Translation
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation
Haipeng Sun
Rui Wang
Kehai Chen
Masao Utiyama
Eiichiro Sumita
Tiejun Zhao
AIMat
41
47
0
21 Apr 2020
Understanding the Difficulty of Training Transformers
Understanding the Difficulty of Training Transformers
Liyuan Liu
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
Jiawei Han
AI4CE
55
254
0
17 Apr 2020
Understanding and Improving Knowledge Distillation
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
60
131
0
10 Feb 2020
Understanding Knowledge Distillation in Non-autoregressive Machine
  Translation
Understanding Knowledge Distillation in Non-autoregressive Machine Translation
Chunting Zhou
Graham Neubig
Jiatao Gu
54
220
0
07 Nov 2019
On the Efficacy of Knowledge Distillation
On the Efficacy of Knowledge Distillation
Ligang He
Rui Mao
90
605
0
03 Oct 2019
Bridging the Gap between Training and Inference for Neural Machine
  Translation
Bridging the Gap between Training and Inference for Neural Machine Translation
Wen Zhang
Yang Feng
Fandong Meng
Di You
Qun Liu
AIMat
68
241
0
06 Jun 2019
Levenshtein Transformer
Levenshtein Transformer
Jiatao Gu
Changhan Wang
Jake Zhao
107
359
0
27 May 2019
Knowledge Distillation via Route Constrained Optimization
Knowledge Distillation via Route Constrained Optimization
Xiao Jin
Baoyun Peng
Yichao Wu
Yu Liu
Jiaheng Liu
Ding Liang
Junjie Yan
Xiaolin Hu
61
170
0
19 Apr 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
95
3,147
0
01 Apr 2019
Multilingual Neural Machine Translation with Knowledge Distillation
Multilingual Neural Machine Translation with Knowledge Distillation
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
68
250
0
27 Feb 2019
Improved Knowledge Distillation via Teacher Assistant
Improved Knowledge Distillation via Teacher Assistant
Seyed Iman Mirzadeh
Mehrdad Farajtabar
Ang Li
Nir Levine
Akihiro Matsukawa
H. Ghasemzadeh
92
1,074
0
09 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.5K
94,511
0
11 Oct 2018
Scaling Neural Machine Translation
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
169
614
0
01 Jun 2018
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
95
795
0
07 Nov 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
636
130,942
0
12 Jun 2017
Convolutional Sequence to Sequence Learning
Convolutional Sequence to Sequence Learning
Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann N. Dauphin
AIMat
148
3,283
0
08 May 2017
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient
Lantao Yu
Weinan Zhang
Jun Wang
Yong Yu
GAN
62
2,396
0
18 Sep 2016
Sequence-Level Knowledge Distillation
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
107
1,114
0
25 Jun 2016
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
195
7,729
0
31 Aug 2015
Scheduled Sampling for Sequence Prediction with Recurrent Neural
  Networks
Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks
Samy Bengio
Oriol Vinyals
Navdeep Jaitly
Noam M. Shazeer
133
2,032
0
09 Jun 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
312
19,609
0
09 Mar 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.5K
149,842
0
22 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
507
27,263
0
01 Sep 2014
1