ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.12330
  4. Cited By
Task-agnostic Distillation of Encoder-Decoder Language Models

Task-agnostic Distillation of Encoder-Decoder Language Models

21 May 2023
Chen Zhang
Yang Yang
Jingang Wang
Dawei Song
ArXivPDFHTML

Papers citing "Task-agnostic Distillation of Encoder-Decoder Language Models"

18 / 18 papers shown
Title
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
89
25
0
19 Feb 2023
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
181
3,117
0
20 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
344
1,091
0
05 Oct 2022
PROD: Progressive Distillation for Dense Retrieval
PROD: Progressive Distillation for Dense Retrieval
Zhenghao Lin
Yeyun Gong
Xiao Liu
Hang Zhang
Chen Lin
...
Jian Jiao
Jing Lu
Daxin Jiang
Rangan Majumder
Nan Duan
71
27
0
27 Sep 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
51
184
0
01 Apr 2022
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
  Quantization
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Zheng Li
Zijian Wang
Ming Tan
Ramesh Nallapati
Parminder Bhatia
Andrew O. Arnold
Bing Xiang
Dan Roth
MQ
43
43
0
21 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
62
104
0
21 Mar 2022
MiniLMv2: Multi-Head Self-Attention Relation Distillation for
  Compressing Pretrained Transformers
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
76
267
0
31 Dec 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
139
1,265
0
25 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
230
7,504
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
100
1,860
0
23 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
618
24,431
0
26 Jul 2019
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional
  Neural Networks for Extreme Summarization
Don't Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization
Shashi Narayan
Shay B. Cohen
Mirella Lapata
AILaw
119
1,674
0
27 Aug 2018
Neural Network Acceptability Judgments
Neural Network Acceptability Judgments
Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
230
1,407
0
31 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,154
0
20 Apr 2018
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
520
4,476
0
18 Apr 2017
Get To The Point: Summarization with Pointer-Generator Networks
Get To The Point: Summarization with Pointer-Generator Networks
A. See
Peter J. Liu
Christopher D. Manning
3DPC
293
4,019
0
14 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
274
8,127
0
16 Jun 2016
1