ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.12993
  4. Cited By
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference

27 April 2020
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
ArXivPDFHTML

Papers citing "DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference"

50 / 76 papers shown
Title
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Jie Ou
Jinyu Guo
Shuaihong Jiang
Zhaokun Wang
Libo Qin
Shunyu Yao
Wenhong Tian
3DV
12
0
0
19 May 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
42
0
0
06 Mar 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
34
1
0
19 Feb 2025
Language Models Can Predict Their Own Behavior
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLM
AI4TS
LRM
63
0
0
18 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
76
0
0
02 Feb 2025
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
40
2
0
28 Oct 2024
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Yijiong Yu
Ma Xiufa
Fang Jianwei
Zhi-liang Xu
Su Guangyao
...
Zhixiao Qi
Wei Wang
Wei Liu
Ran Chen
Ji Pei
LRM
RALM
29
0
0
06 Oct 2024
GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation
GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation
Ashirbad Mishra
Soumik Dey
Marshall Wu
Jinyu Zhao
He Yu
Kaichen Ni
Binbin Li
Kamesh Madduri
57
1
0
05 Sep 2024
Membership Inference Attack Against Masked Image Modeling
Membership Inference Attack Against Masked Image Modeling
Zehan Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Accelerating Large Language Model Inference with Self-Supervised Early
  Exits
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
44
1
0
30 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Roman Vashurin
Ekaterina Fadeeva
Artem Vazhentsev
Akim Tsvigun
Daniil Vasilev
...
Timothy Baldwin
Timothy Baldwin
Maxim Panov
Artem Shelmanov
Artem Shelmanov
HILM
68
9
0
21 Jun 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech
  Representation Models
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
T. Lin
Hung-yi Lee
Hao Tang
40
1
0
08 Jun 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
40
3
0
25 Mar 2024
DE$^3$-BERT: Distance-Enhanced Early Exiting for BERT based on
  Prototypical Networks
DE3^33-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Qi Zhang
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
38
3
0
03 Feb 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language
  Models with 3D Parallelism
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
PAUMER: Patch Pausing Transformer for Semantic Segmentation
PAUMER: Patch Pausing Transformer for Semantic Segmentation
Evann Courdier
Prabhu Teja Sivaprasad
F. Fleuret
37
2
0
01 Nov 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
32
10
0
05 Oct 2023
SplitEE: Early Exit in Deep Neural Networks with Split Computing
SplitEE: Early Exit in Deep Neural Networks with Split Computing
Divya J. Bajpai
Vivek K. Trivedi
S. L. Yadav
M. Hanawal
28
5
0
17 Sep 2023
Using Early Exits for Fast Inference in Automatic Modulation
  Classification
Using Early Exits for Fast Inference in Automatic Modulation Classification
E. Mohammed
Omar Mashaal
H. Abou-zeid
21
3
0
22 Aug 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and
  Multi-label text Classification Tasks
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
26
12
0
21 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning
  and Coding with LLMs
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
33
38
0
19 May 2023
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
Xiaonan Li
Xipeng Qiu
ReLM
KELM
LRM
AI4MH
26
32
0
09 May 2023
Revisiting Single-gated Mixtures of Experts
Revisiting Single-gated Mixtures of Experts
Amelie Royer
I. Karmanov
Andrii Skliar
B. Bejnordi
Tijmen Blankevoort
MoE
MoMe
38
6
0
11 Apr 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization,
  distillation, and pruning regimes
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLM
AAML
13
2
0
30 Mar 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
193
0
14 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for
  Click-Through Rate Prediction
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
Yachen Yan
Liubo Li
14
3
0
06 Jan 2023
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
37
8
0
15 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
26
2
0
06 Dec 2022
Understanding the Robustness of Multi-Exit Models under Common
  Corruptions
Understanding the Robustness of Multi-Exit Models under Common Corruptions
Akshay Mehra
Skyler Seto
Navdeep Jaitly
B. Theobald
AAML
16
3
0
03 Dec 2022
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like
  Humans?
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
  Unified Vision Language Model
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
30
31
0
21 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight
  BERT
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
24
1
0
16 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
23
4
0
01 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency
  with Slenderized Multi-exit Language Models
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
47
4
0
27 Oct 2022
ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and
  Effective Text Generation
ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
22
17
0
24 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide
  Computation Reduction for Transfer Learning
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
35
16
0
18 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
36
16
0
14 Oct 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural
  Networks on Edge NPUs
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
42
8
0
27 Sep 2022
Unsupervised Early Exit in DNNs with Multiple Exits
Unsupervised Early Exit in DNNs with Multiple Exits
U. HariNarayanN
M. Hanawal
Avinash Bhardwaj
29
10
0
20 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of
  Efficiency Operators for Transformers
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
18
1
0
31 Jul 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
20
6
0
22 Jun 2022
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance
  Ranking
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
Minghan Li
Xinyu Crystina Zhang
Ji Xin
Hongyang R. Zhang
Jimmy J. Lin
38
6
0
19 May 2022
PALBERT: Teaching ALBERT to Ponder
PALBERT: Teaching ALBERT to Ponder
Nikita Balagansky
Daniil Gavrilov
MoE
26
6
0
07 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
12
Next