ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2012.14682
  4. Cited By
CascadeBERT: Accelerating Inference of Pre-trained Language Models via
  Calibrated Complete Models Cascade

CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade

29 December 2020
Lei Li
Yankai Lin
Deli Chen
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
ArXivPDFHTML

Papers citing "CascadeBERT: Accelerating Inference of Pre-trained Language Models via Calibrated Complete Models Cascade"

30 / 30 papers shown
Title
A Unified Approach to Routing and Cascading for LLMs
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck
Maximilian Baader
Martin Vechev
94
2
0
14 Oct 2024
Accelerating BERT Inference for Sequence Labeling via Early-Exit
Accelerating BERT Inference for Sequence Labeling via Early-Exit
Xiaonan Li
Yunfan Shao
Tianxiang Sun
Hang Yan
Xipeng Qiu
Xuanjing Huang
44
40
0
28 May 2021
Early Exiting with Ensemble Internal Classifiers
Early Exiting with Ensemble Internal Classifiers
Tianxiang Sun
Yunhua Zhou
Xiangyang Liu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
47
30
0
28 May 2021
Learning to Cascade: Confidence Calibration for Improving the Accuracy
  and Computational Cost of Cascade Inference Systems
Learning to Cascade: Confidence Calibration for Improving the Accuracy and Computational Cost of Cascade Inference Systems
Shohei Enomoto
Takeharu Eda
UQCV
56
17
0
15 Apr 2021
TernaryBERT: Distillation-aware Ultra-low Bit BERT
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
71
209
0
27 Sep 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
36
337
0
07 Jun 2020
The Cascade Transformer: an Application for Efficient Answer Sentence
  Selection
The Cascade Transformer: an Application for Efficient Answer Sentence Selection
Luca Soldaini
Alessandro Moschitti
48
44
0
05 May 2020
What Happens To BERT Embeddings During Fine-tuning?
What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant
Elahe Rahimtoroghi
Ellie Pavlick
Ian Tenney
43
186
0
29 Apr 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
42
370
0
27 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities
The Right Tool for the Job: Matching Model and Instance Complexities
Roy Schwartz
Gabriel Stanovsky
Swabha Swayamdipta
Jesse Dodge
Noah A. Smith
71
168
0
16 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
75
356
0
05 Apr 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
252
199
0
07 Feb 2020
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
141
10,720
0
29 Oct 2019
Q8BERT: Quantized 8Bit BERT
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
57
502
0
14 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
134
7,437
0
02 Oct 2019
Reducing Transformer Depth on Demand with Structured Dropout
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
93
588
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
62
1,847
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
101
833
0
25 Aug 2019
Well-Read Students Learn Better: On the Importance of Pre-training
  Compact Models
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models
Iulia Turc
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
52
224
0
23 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
408
24,160
0
26 Jul 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
79
1,051
0
25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy
  Lifting, the Rest Can Be Pruned
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
76
1,120
0
23 May 2019
BERT Rediscovers the Classical NLP Pipeline
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
Dipanjan Das
Ellie Pavlick
MILM
SSeg
103
1,458
0
15 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
966
93,936
0
11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
658
7,080
0
20 Apr 2018
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
199
5,774
0
14 Jun 2017
A Broad-Coverage Challenge Corpus for Sentence Understanding through
  Inference
A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference
Adina Williams
Nikita Nangia
Samuel R. Bowman
405
4,444
0
18 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
156
8,067
0
16 Jun 2016
Adaptive Computation Time for Recurrent Neural Networks
Adaptive Computation Time for Recurrent Neural Networks
Alex Graves
48
544
0
29 Mar 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
241
19,523
0
09 Mar 2015
1