ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.07947
  4. Cited By
Sequence-Level Knowledge Distillation

Sequence-Level Knowledge Distillation

25 June 2016
Yoon Kim
Alexander M. Rush
ArXivPDFHTML

Papers citing "Sequence-Level Knowledge Distillation"

50 / 244 papers shown
Title
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
44
63
0
24 Mar 2021
Split Computing and Early Exiting for Deep Learning Applications: Survey
  and Research Challenges
Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges
Yoshitomo Matsubara
Marco Levorato
Francesco Restuccia
33
199
0
08 Mar 2021
An Efficient Transformer Decoder with Compressed Sub-layers
An Efficient Transformer Decoder with Compressed Sub-layers
Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
33
29
0
03 Jan 2021
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting
Wangchunshu Zhou
Tao Ge
Canwen Xu
Ke Xu
Furu Wei
LRM
16
15
0
02 Jan 2021
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Fully Non-autoregressive Neural Machine Translation: Tricks of the Trade
Jiatao Gu
X. Kong
31
135
0
31 Dec 2020
Neural Machine Translation: A Review of Methods, Resources, and Tools
Neural Machine Translation: A Review of Methods, Resources, and Tools
Zhixing Tan
Shuo Wang
Zonghan Yang
Gang Chen
Xuancheng Huang
Maosong Sun
Yang Liu
3DV
AI4TS
19
105
0
31 Dec 2020
Understanding and Improving Lexical Choice in Non-Autoregressive
  Translation
Understanding and Improving Lexical Choice in Non-Autoregressive Translation
Liang Ding
Longyue Wang
Xuebo Liu
Derek F. Wong
Dacheng Tao
Zhaopeng Tu
109
77
0
29 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
Reinforced Multi-Teacher Selection for Knowledge Distillation
Fei Yuan
Linjun Shou
J. Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
15
121
0
11 Dec 2020
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Chunting Zhou
Graham Neubig
Jiatao Gu
Mona T. Diab
P. Guzmán
Luke Zettlemoyer
Marjan Ghazvininejad
HILM
39
195
0
05 Nov 2020
Pre-trained Summarization Distillation
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
26
98
0
24 Oct 2020
Structural Knowledge Distillation: Tractably Distilling Information for
  Structured Predictor
Structural Knowledge Distillation: Tractably Distilling Information for Structured Predictor
Xinyu Wang
Yong-jia Jiang
Zhaohui Yan
Zixia Jia
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
26
10
0
10 Oct 2020
Automated Concatenation of Embeddings for Structured Prediction
Automated Concatenation of Embeddings for Structured Prediction
Xinyu Wang
Yong-jia Jiang
Nguyen Bach
Tao Wang
Zhongqiang Huang
Fei Huang
Kewei Tu
35
172
0
10 Oct 2020
Lifelong Language Knowledge Distillation
Lifelong Language Knowledge Distillation
Yung-Sung Chuang
Shang-Yu Su
Yun-Nung Chen
KELM
CLL
27
49
0
05 Oct 2020
WeChat Neural Machine Translation Systems for WMT20
WeChat Neural Machine Translation Systems for WMT20
Fandong Meng
Jianhao Yan
Yijin Liu
Yuan Gao
Xia Zeng
...
Peng Li
Ming Chen
Jie Zhou
Sifan Liu
Hao Zhou
24
21
0
01 Oct 2020
Teacher-Critical Training Strategies for Image Captioning
Teacher-Critical Training Strategies for Image Captioning
Yiqing Huang
Jiansheng Chen
VLM
29
8
0
30 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
33
208
0
27 Sep 2020
Softmax Tempering for Training Neural Machine Translation Models
Softmax Tempering for Training Neural Machine Translation Models
Raj Dabre
Atsushi Fujita
28
11
0
20 Sep 2020
Code-switching pre-training for neural machine translation
Code-switching pre-training for neural machine translation
Zhen Yang
Bojie Hu
Ambyera Han
Shen Huang
Qi Ju
27
71
0
17 Sep 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
18
90
0
09 Aug 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
  Translation
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
38
134
0
18 Jun 2020
Multi-fidelity Neural Architecture Search with Knowledge Distillation
Multi-fidelity Neural Architecture Search with Knowledge Distillation
I. Trofimov
Nikita Klyuchnikov
Mikhail Salnikov
Alexander N. Filippov
Evgeny Burnaev
32
15
0
15 Jun 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,843
0
09 Jun 2020
Self-Distillation as Instance-Specific Label Smoothing
Self-Distillation as Instance-Specific Label Smoothing
Zhilu Zhang
M. Sabuncu
20
116
0
09 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
End-to-End Speech-Translation with Knowledge Distillation: FBK@IWSLT2020
Marco Gaido
Mattia Antonino Di Gangi
Matteo Negri
Marco Turchi
19
53
0
04 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
71
40,200
0
28 May 2020
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
Syntactic Structure Distillation Pretraining For Bidirectional Encoders
A. Kuncoro
Lingpeng Kong
Daniel Fried
Dani Yogatama
Laura Rimell
Chris Dyer
Phil Blunsom
31
33
0
27 May 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
54
475
0
22 May 2020
Non-Autoregressive Image Captioning with Counterfactuals-Critical
  Multi-Agent Learning
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning
Longteng Guo
Jing Liu
Xinxin Zhu
Xingjian He
Jie Jiang
Hanqing Lu
BDL
24
56
0
10 May 2020
schuBERT: Optimizing Elements of BERT
schuBERT: Optimizing Elements of BERT
A. Khetan
Zohar Karnin
28
30
0
09 May 2020
Improving Non-autoregressive Neural Machine Translation with Monolingual
  Data
Improving Non-autoregressive Neural Machine Translation with Monolingual Data
Jiawei Zhou
Phillip Keung
30
26
0
02 May 2020
Distilling Knowledge for Fast Retrieval-based Chat-bots
Distilling Knowledge for Fast Retrieval-based Chat-bots
Amir Vakili Tahami
Kamyar Ghajar
A. Shakery
24
31
0
23 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
38
167
0
16 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence
  Labeling
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang
Yong-jia Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
28
36
0
08 Apr 2020
Exploring Versatile Generative Language Model Via Parameter-Efficient
  Transfer Learning
Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning
Zhaojiang Lin
Andrea Madotto
Pascale Fung
34
155
0
08 Apr 2020
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Aligned Cross Entropy for Non-Autoregressive Machine Translation
Marjan Ghazvininejad
Vladimir Karpukhin
Luke Zettlemoyer
Omer Levy
30
115
0
03 Apr 2020
Analysis of Knowledge Transfer in Kernel Regime
Analysis of Knowledge Transfer in Kernel Regime
Arman Rahbar
Ashkan Panahi
Chiranjib Bhattacharyya
Devdatt Dubhashi
M. Chehreghani
23
3
0
30 Mar 2020
Understanding and Improving Knowledge Distillation
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
27
129
0
10 Feb 2020
Teaching Machines to Converse
Teaching Machines to Converse
Jiwei Li
29
4
0
31 Jan 2020
Modeling Teacher-Student Techniques in Deep Neural Networks for
  Knowledge Distillation
Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation
Sajjad Abbasi
M. Hajabdollahi
N. Karimi
S. Samavi
10
28
0
31 Dec 2019
Neural Machine Translation: A Review and Survey
Neural Machine Translation: A Review and Survey
Felix Stahlberg
3DV
AI4TS
MedIm
20
311
0
04 Dec 2019
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine
  Translation
Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation
Junliang Guo
Xu Tan
Linli Xu
Tao Qin
Enhong Chen
Tie-Yan Liu
14
85
0
20 Nov 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
32
27
0
09 Nov 2019
Domain Robustness in Neural Machine Translation
Domain Robustness in Neural Machine Translation
Mathias Müller
Annette Rios Gonzales
Rico Sennrich
33
95
0
08 Nov 2019
Microsoft Research Asia's Systems for WMT19
Microsoft Research Asia's Systems for WMT19
Yingce Xia
Xu Tan
Fei Tian
Fei Gao
Weicong Chen
...
Yiren Wang
Lijun Wu
Jinhua Zhu
Tao Qin
Tie-Yan Liu
VLM
24
26
0
07 Nov 2019
Domain, Translationese and Noise in Synthetic Data for Neural Machine
  Translation
Domain, Translationese and Noise in Synthetic Data for Neural Machine Translation
Nikolay Bogoychev
Rico Sennrich
16
50
0
06 Nov 2019
Fast Structured Decoding for Sequence Models
Fast Structured Decoding for Sequence Models
Zhiqing Sun
Zhuohan Li
Haoqing Wang
Zi Lin
Di He
Zhihong Deng
21
122
0
25 Oct 2019
Model Compression with Two-stage Multi-teacher Knowledge Distillation
  for Web Question Answering System
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
20
92
0
18 Oct 2019
Soft-Label Dataset Distillation and Text Dataset Distillation
Soft-Label Dataset Distillation and Text Dataset Distillation
Ilia Sucholutsky
Matthias Schonlau
DD
33
131
0
06 Oct 2019
Previous
12345
Next