ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.07947
  4. Cited By
Sequence-Level Knowledge Distillation

Sequence-Level Knowledge Distillation

25 June 2016
Yoon Kim
Alexander M. Rush
ArXivPDFHTML

Papers citing "Sequence-Level Knowledge Distillation"

50 / 244 papers shown
Title
Revisiting Non-Autoregressive Translation at Scale
Revisiting Non-Autoregressive Translation at Scale
Zhihao Wang
Longyue Wang
Jinsong Su
Junfeng Yao
Zhaopeng Tu
36
3
0
25 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
30
3
0
24 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
33
79
0
17 May 2023
Target-Side Augmentation for Document-Level Machine Translation
Target-Side Augmentation for Document-Level Machine Translation
Guangsheng Bao
Zhiyang Teng
Yue Zhang
46
10
0
08 May 2023
A Systematic Study of Knowledge Distillation for Natural Language
  Generation with Pseudo-Target Training
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Nitay Calderon
Subhabrata Mukherjee
Roi Reichart
Amir Kantor
41
17
0
03 May 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Bo-wen Li
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
30
55
0
11 Apr 2023
DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural
  Network Worry-Free?
DSD2^22: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?
Victor Quétu
Enzo Tartaglione
32
7
0
02 Mar 2023
Towards domain generalisation in ASR with elitist sampling and ensemble
  knowledge distillation
Towards domain generalisation in ASR with elitist sampling and ensemble knowledge distillation
Rehan Ahmad
Md. Asif Jalal
Muhammad Umar Farooq
A. Ollerenshaw
Thomas Hain
18
2
0
01 Mar 2023
A Reparameterized Discrete Diffusion Model for Text Generation
A Reparameterized Discrete Diffusion Model for Text Generation
Lin Zheng
Jianbo Yuan
Lei Yu
Lingpeng Kong
DiffM
41
57
0
11 Feb 2023
N-Gram Nearest Neighbor Machine Translation
N-Gram Nearest Neighbor Machine Translation
Rui Lv
Junliang Guo
Rui Wang
Xu Tan
Qi Liu
Tao Qin
23
2
0
30 Jan 2023
How Does Beam Search improve Span-Level Confidence Estimation in
  Generative Sequence Labeling?
How Does Beam Search improve Span-Level Confidence Estimation in Generative Sequence Labeling?
Kazuma Hashimoto
Iftekhar Naim
K. Raman
UQLM
29
2
0
21 Dec 2022
Empowering Diffusion Models on the Embedding Space for Text Generation
Empowering Diffusion Models on the Embedding Space for Text Generation
Zhujin Gao
Junliang Guo
Xuejiao Tan
Yongxin Zhu
Fang Zhang
Jiang Bian
Linli Xu
DiffM
32
15
0
19 Dec 2022
WACO: Word-Aligned Contrastive Learning for Speech Translation
WACO: Word-Aligned Contrastive Learning for Speech Translation
Siqi Ouyang
Rong Ye
Lei Li
32
25
0
19 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image
  Inpainting
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
35
176
0
13 Dec 2022
Life-long Learning for Multilingual Neural Machine Translation with
  Knowledge Distillation
Life-long Learning for Multilingual Neural Machine Translation with Knowledge Distillation
Yang Zhao
Junnan Zhu
Lu Xiang
Jiajun Zhang
Yu Zhou
Feifei Zhai
Chengqing Zong
CLL
47
6
0
06 Dec 2022
Democratizing Neural Machine Translation with OPUS-MT
Democratizing Neural Machine Translation with OPUS-MT
Jörg Tiedemann
Mikko Aulamo
Daria Bakshandaeva
M. Boggia
Stig-Arne Gronroos
Tommi Nieminen
Alessandro Raganato
Yves Scherrer
Raúl Vázquez
Sami Virpioja
18
28
0
04 Dec 2022
The RoyalFlush System for the WMT 2022 Efficiency Task
The RoyalFlush System for the WMT 2022 Efficiency Task
Bo Qin
Aixin Jia
Qiang Wang
Jian Lu
Shuqin Pan
Haibo Wang
Ming-Tso Chen
46
1
0
03 Dec 2022
Improving Simultaneous Machine Translation with Monolingual Data
Improving Simultaneous Machine Translation with Monolingual Data
Hexuan Deng
Liang Ding
Xuebo Liu
Meishan Zhang
Dacheng Tao
Min Zhang
35
12
0
02 Dec 2022
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic
  Speech Recognition
Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition
J. Yoon
Beom Jun Woo
Sunghwan Ahn
Hyeon Seung Lee
N. Kim
VLM
20
9
0
28 Nov 2022
Summer: WeChat Neural Machine Translation Systems for the WMT22
  Biomedical Translation Task
Summer: WeChat Neural Machine Translation Systems for the WMT22 Biomedical Translation Task
Ernan Li
Fandong Meng
Jie Zhou
MedIm
10
1
0
28 Nov 2022
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
BJTU-WeChat's Systems for the WMT22 Chat Translation Task
Yunlong Liang
Fandong Meng
Jinan Xu
Jinan Xu
Jie Zhou
24
2
0
28 Nov 2022
Continual Learning of Neural Machine Translation within Low Forgetting
  Risk Regions
Continual Learning of Neural Machine Translation within Low Forgetting Risk Regions
Shuhao Gu
Bojie Hu
Yang Feng
CLL
38
12
0
03 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Too Brittle To Touch: Comparing the Stability of Quantization and
  Distillation Towards Developing Lightweight Low-Resource MT Models
Too Brittle To Touch: Comparing the Stability of Quantization and Distillation Towards Developing Lightweight Low-Resource MT Models
Harshita Diddee
Sandipan Dandapat
Monojit Choudhury
T. Ganu
Kalika Bali
31
5
0
27 Oct 2022
Referee: Reference-Free Sentence Summarization with Sharper
  Controllability through Symbolic Knowledge Distillation
Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation
Melanie Sclar
Peter West
Sachin Kumar
Yulia Tsvetkov
Yejin Choi
22
19
0
25 Oct 2022
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model
  for Low-Resource Languages
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
29
20
0
20 Oct 2022
A baseline revisited: Pushing the limits of multi-segment models for
  context-aware translation
A baseline revisited: Pushing the limits of multi-segment models for context-aware translation
Suvodeep Majumde
Stanislas Lauly
Maria Nadejde
Marcello Federico
Georgiana Dinu
38
13
0
19 Oct 2022
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive
  Machine Translation
Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation
Chenze Shao
Zhengrui Ma
Yang Feng
42
14
0
11 Oct 2022
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine
  Translation
Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation
Chenze Shao
Yang Feng
35
20
0
08 Oct 2022
Meta-Ensemble Parameter Learning
Meta-Ensemble Parameter Learning
Zhengcong Fei
Shuman Tian
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
OOD
44
2
0
05 Oct 2022
Direct Speech Translation for Automatic Subtitling
Direct Speech Translation for Automatic Subtitling
Sara Papi
Marco Gaido
Alina Karakanta
Mauro Cettolo
Matteo Negri
Marco Turchi
54
11
0
27 Sep 2022
CATER: Intellectual Property Protection on Text Generation APIs via
  Conditional Watermarks
CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks
Xuanli He
Qiongkai Xu
Yi Zeng
Lingjuan Lyu
Fangzhao Wu
Jiwei Li
R. Jia
WaLM
188
72
0
19 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Membership Inference Attacks by Exploiting Loss Trajectory
Membership Inference Attacks by Exploiting Loss Trajectory
Yiyong Liu
Zhengyu Zhao
Michael Backes
Yang Zhang
27
98
0
31 Aug 2022
ProDiff: Progressive Fast Diffusion Model For High-Quality
  Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Rongjie Huang
Zhou Zhao
Huadai Liu
Jinglin Liu
Chenye Cui
Yi Ren
DiffM
44
195
0
13 Jul 2022
Building Multilingual Machine Translation Systems That Serve Arbitrary
  X-Y Translations
Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations
Akiko Eriguchi
Shufang Xie
Tao Qin
Hany Awadalla
LRM
53
7
0
30 Jun 2022
Bridging the Gap Between Training and Inference of Bayesian Controllable
  Language Models
Bridging the Gap Between Training and Inference of Bayesian Controllable Language Models
Han Liu
Bingning Wang
Ting Yao
Haijin Liang
Jianjin Xu
Xiaolin Hu
BDL
37
1
0
11 Jun 2022
What Do Compressed Multilingual Machine Translation Models Forget?
What Do Compressed Multilingual Machine Translation Models Forget?
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
AI4CE
42
9
0
22 May 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
27
4
0
19 May 2022
Building Machine Translation Systems for the Next Thousand Languages
Building Machine Translation Systems for the Next Thousand Languages
Ankur Bapna
Isaac Caswell
Julia Kreutzer
Orhan Firat
D. Esch
...
Apurva Shah
Yanping Huang
Z. Chen
Yonghui Wu
Macduff Hughes
56
98
0
09 May 2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Efficient yet Competitive Speech Translation: FBK@IWSLT2022
Marco Gaido
Sara Papi
Dennis Fucci
G. Fiameni
Matteo Negri
Marco Turchi
31
19
0
05 May 2022
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Non-Autoregressive Machine Translation: It's Not as Fast as it Seems
Jindvrich Helcl
Barry Haddow
Alexandra Birch
27
20
0
04 May 2022
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Nearest Neighbor Knowledge Distillation for Neural Machine Translation
Zhixian Yang
Renliang Sun
Xiaojun Wan
18
12
0
01 May 2022
Prompt Consistency for Zero-Shot Task Generalization
Prompt Consistency for Zero-Shot Task Generalization
Chunting Zhou
Junxian He
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
VLM
26
74
0
29 Apr 2022
UniTE: Unified Translation Evaluation
UniTE: Unified Translation Evaluation
Boyi Deng
Dayiheng Liu
Baosong Yang
Haibo Zhang
Boxing Chen
Derek F. Wong
Lidia S. Chao
41
41
0
28 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
Reducing Model Jitter: Stable Re-training of Semantic Parsers in
  Production Environments
Reducing Model Jitter: Stable Re-training of Semantic Parsers in Production Environments
Christopher Hidey
Fei Liu
Rahul Goel
27
4
0
10 Apr 2022
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
GigaST: A 10,000-hour Pseudo Speech Translation Corpus
Rong Ye
Chengqi Zhao
Tom Ko
Chutong Meng
Tao Wang
Mingxuan Wang
Jun Cao
9
23
0
08 Apr 2022
Does Simultaneous Speech Translation need Simultaneous Models?
Does Simultaneous Speech Translation need Simultaneous Models?
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
41
26
0
08 Apr 2022
$\textit{latent}$-GLAT: Glancing at Latent Variables for Parallel Text
  Generation
latent\textit{latent}latent-GLAT: Glancing at Latent Variables for Parallel Text Generation
Yu Bao
Hao Zhou
Shujian Huang
Dongqi Wang
Lihua Qian
Xinyu Dai
Jiajun Chen
Lei Li
31
38
0
05 Apr 2022
Previous
12345
Next