ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2307.15190
  4. Cited By
f-Divergence Minimization for Sequence-Level Knowledge Distillation

f-Divergence Minimization for Sequence-Level Knowledge Distillation

27 July 2023
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
ArXiv (abs)PDFHTML

Papers citing "f-Divergence Minimization for Sequence-Level Knowledge Distillation"

50 / 59 papers shown
Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via $α$-$β$-Divergence
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via ααα-βββ-Divergence
Guanghui Wang
Zhiyong Yang
Ziyi Wang
Shi Wang
Qianqian Xu
Qingming Huang
237
0
0
07 May 2025
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Fenglu Hong
Ravi Raju
Jonathan Li
Bo Li
Urmish Thakker
Avinash Ravichandran
Swayambhoo Jain
Changran Hu
83
0
0
10 Mar 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Ziyang Chen
Mingxiao Li
Shangsong Liang
Zhaochun Ren
V. Honavar
219
9
0
21 Feb 2025
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
Haoyang Li
149
7
0
19 Dec 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
121
6
0
28 Oct 2024
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux
Çağlar Gülçehre
108
4
0
28 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
115
0
0
25 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
140
5
0
22 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
147
5
0
07 Oct 2024
Direct Preference Knowledge Distillation for Large Language Models
Direct Preference Knowledge Distillation for Large Language Models
Yixing Li
Yuxian Gu
Li Dong
Dequan Wang
Yu Cheng
Furu Wei
98
6
0
28 Jun 2024
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Yuqiao Wen
Behzad Shayegh
Chenyang Huang
Yanshuai Cao
Lili Mou
105
5
0
29 Feb 2024
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
Robustness-Reinforced Knowledge Distillation with Correlation Distance and Network Pruning
Seonghak Kim
Gyeongdo Ham
Yucheol Cho
Daeshik Kim
93
4
0
23 Nov 2023
An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
An Equal-Size Hard EM Algorithm for Diverse Dialogue Generation
Yuqiao Wen
Yongchang Hao
Yanshuai Cao
Lili Mou
99
10
0
29 Sep 2022
One Reference Is Not Enough: Diverse Distillation with Reference
  Selection for Non-Autoregressive Translation
One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation
Chenze Shao
Xuanfu Wu
Yang Feng
31
23
0
28 May 2022
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Gongfan Fang
Yifan Bao
Mingli Song
Xinchao Wang
Don Xie
Chengchao Shen
Xiuming Zhang
74
44
0
27 Oct 2021
Non-Autoregressive Translation with Layer-Wise Prediction and Deep
  Supervision
Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision
Chenyang Huang
Hao Zhou
Osmar R. Zaïane
Lili Mou
Lei Li
129
61
0
14 Oct 2021
Commonsense-Focused Dialogues for Response Generation: An Empirical
  Study
Commonsense-Focused Dialogues for Response Generation: An Empirical Study
Pei Zhou
Karthik Gopalakrishnan
Behnam Hedayatnia
Seokhwan Kim
Jay Pujara
Xiang Ren
Yang Liu
Dilek Z. Hakkani-Tür
LRM
59
46
0
14 Sep 2021
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
Chuanxin Tang
Yucheng Zhao
Guangting Wang
Chong Luo
Wenxuan Xie
Wenjun Zeng
MoEViT
70
99
0
12 Sep 2021
Layer-wise Model Pruning based on Mutual Information
Layer-wise Model Pruning based on Mutual Information
Chun Fan
Jiwei Li
Xiang Ao
Leilei Gan
Yuxian Meng
Xiaofei Sun
75
19
0
28 Aug 2021
One Teacher is Enough? Pre-trained Language Model Distillation from
  Multiple Teachers
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
Chuhan Wu
Fangzhao Wu
Yongfeng Huang
58
65
0
02 Jun 2021
Learning Noise Transition Matrix from Only Noisy Labels via Total
  Variation Regularization
Learning Noise Transition Matrix from Only Noisy Labels via Total Variation Regularization
Yivan Zhang
Gang Niu
Masashi Sugiyama
NoLa
65
81
0
04 Feb 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
248
4,261
0
01 Jan 2021
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
44
12
0
11 Dec 2020
Pre-trained Summarization Distillation
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
58
101
0
24 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth
  Mover's Distance
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
103
54
0
13 Oct 2020
Autoregressive Knowledge Distillation through Imitation Learning
Autoregressive Knowledge Distillation through Imitation Learning
Alexander Lin
Jeremy Wohlwend
Howard Chen
Tao Lei
72
39
0
15 Sep 2020
Bridging Maximum Likelihood and Adversarial Learning via
  $α$-Divergence
Bridging Maximum Likelihood and Adversarial Learning via ααα-Divergence
Miaoyun Zhao
Yulai Cong
Shuyang Dai
Lawrence Carin
GAN
50
10
0
13 Jul 2020
DART: Open-Domain Structured Data Record to Text Generation
DART: Open-Domain Structured Data Record to Text Generation
Linyong Nan
Dragomir R. Radev
Rui Zhang
Amrit Rau
Abhinand Sivaprasad
...
Y. Tan
Xi Lin
Caiming Xiong
R. Socher
Nazneen Rajani
60
202
0
06 Jul 2020
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine
  Translation
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation
Jungo Kasai
Nikolaos Pappas
Hao Peng
James Cross
Noah A. Smith
81
139
0
18 Jun 2020
ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine
  Translation
ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation
Lifu Tu
Richard Yuanzhe Pang
Sam Wiseman
Kevin Gimpel
79
52
0
02 May 2020
BLEURT: Learning Robust Metrics for Text Generation
BLEURT: Learning Robust Metrics for Text Generation
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
101
1,501
0
09 Apr 2020
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion
Dreaming to Distill: Data-free Knowledge Transfer via DeepInversion
Hongxu Yin
Pavlo Molchanov
Zhizhong Li
J. Álvarez
Arun Mallya
Derek Hoiem
N. Jha
Jan Kautz
69
565
0
18 Dec 2019
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive
  Summarization
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
Jingqing Zhang
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM3DGS
288
2,049
0
18 Dec 2019
DialoGPT: Large-Scale Generative Pre-training for Conversational
  Response Generation
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang
Siqi Sun
Michel Galley
Yen-Chun Chen
Chris Brockett
Xiang Gao
Jianfeng Gao
Jingjing Liu
W. Dolan
VLM
189
1,524
0
01 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMatVLM
251
10,848
0
29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
445
20,181
0
23 Oct 2019
Model Compression with Two-stage Multi-teacher Knowledge Distillation
  for Web Question Answering System
Model Compression with Two-stage Multi-teacher Knowledge Distillation for Web Question Answering System
Ze Yang
Linjun Shou
Ming Gong
Wutao Lin
Daxin Jiang
60
94
0
18 Oct 2019
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent
  Variable
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
60
270
0
17 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
109
1,860
0
23 Sep 2019
MoverScore: Text Generation Evaluating with Contextualized Embeddings
  and Earth Mover Distance
MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
Wei Zhao
Maxime Peyrard
Fei Liu
Yang Gao
Christian M. Meyer
Steffen Eger
181
598
0
05 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
134
843
0
25 Aug 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
329
5,845
0
21 Apr 2019
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
69
421
0
28 Mar 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional
  Neural Networks for Extreme Summarization
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan
Shay B. Cohen
Mirella Lapata
AILaw
137
1,676
0
27 Aug 2018
Efficient Contextualized Representation: Language Model Pruning for
  Sequence Labeling
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
Liyuan Liu
Xiang Ren
Jingbo Shang
Jian-wei Peng
Jiawei Han
76
44
0
20 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
81
1,051
0
11 Apr 2018
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Jonathan Frankle
Michael Carbin
242
3,473
0
09 Mar 2018
Why Do Neural Dialog Systems Generate Short and Meaningless Replies? A
  Comparison between Dialog and Translation
Why Do Neural Dialog Systems Generate Short and Meaningless Replies? A Comparison between Dialog and Translation
Bolin Wei
Shuai Lu
Lili Mou
Hao Zhou
Pascal Poupart
Ge Li
Zhi Jin
52
29
0
06 Dec 2017
Learning Sparse Neural Networks through $L_0$ Regularization
Learning Sparse Neural Networks through L0L_0L0​ Regularization
Christos Louizos
Max Welling
Diederik P. Kingma
433
1,147
0
04 Dec 2017
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
105
796
0
07 Nov 2017
12
Next