Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.12993
Cited By
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference
27 April 2020
Ji Xin
Raphael Tang
Jaejun Lee
Yaoliang Yu
Jimmy J. Lin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference"
50 / 76 papers shown
Title
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Jie Ou
Jinyu Guo
Shuaihong Jiang
Zhaokun Wang
Libo Qin
Shunyu Yao
Wenhong Tian
3DV
12
0
0
19 May 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
42
0
0
06 Mar 2025
EvoP: Robust LLM Inference via Evolutionary Pruning
Shangyu Wu
Hongchao Du
Ying Xiong
Shuai Chen
Tei-Wei Kuo
Nan Guan
Chun Jason Xue
34
1
0
19 Feb 2025
Language Models Can Predict Their Own Behavior
Dhananjay Ashok
Jonathan May
ReLM
AI4TS
LRM
63
0
0
18 Feb 2025
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya J. Bajpai
M. Hanawal
76
0
0
02 Feb 2025
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
40
2
0
28 Oct 2024
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks
Yijiong Yu
Ma Xiufa
Fang Jianwei
Zhi-liang Xu
Su Guangyao
...
Zhixiao Qi
Wei Wang
Wei Liu
Ran Chen
Ji Pei
LRM
RALM
29
0
0
06 Oct 2024
GraphEx: A Graph-based Extraction Method for Advertiser Keyphrase Recommendation
Ashirbad Mishra
Soumik Dey
Marshall Wu
Jinyu Zhao
He Yu
Kaichen Ni
Binbin Li
Kamesh Madduri
57
1
0
05 Sep 2024
Membership Inference Attack Against Masked Image Modeling
Zehan Li
Xinlei He
Ning Yu
Yang Zhang
42
1
0
13 Aug 2024
Accelerating Large Language Model Inference with Self-Supervised Early Exits
Florian Valade
LRM
44
1
0
30 Jul 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph
Roman Vashurin
Ekaterina Fadeeva
Artem Vazhentsev
Akim Tsvigun
Daniil Vasilev
...
Timothy Baldwin
Timothy Baldwin
Maxim Panov
Artem Shelmanov
Artem Shelmanov
HILM
68
9
0
21 Jun 2024
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models
T. Lin
Hung-yi Lee
Hao Tang
40
1
0
08 Jun 2024
On the Impact of Black-box Deployment Strategies for Edge AI on Latency and Model Performance
Jaskirat Singh
Emad Fallahzadeh
Bram Adams
Ahmed E. Hassan
MQ
40
3
0
25 Mar 2024
DE
3
^3
3
-BERT: Distance-Enhanced Early Exiting for BERT based on Prototypical Networks
Jianing He
Qi Zhang
Weiping Ding
Duoqian Miao
Jun Zhao
Liang Hu
LongBing Cao
38
3
0
03 Feb 2024
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen
Xuchen Pan
Yaliang Li
Bolin Ding
Jingren Zhou
LRM
41
31
0
08 Dec 2023
PAUMER: Patch Pausing Transformer for Semantic Segmentation
Evann Courdier
Prabhu Teja Sivaprasad
F. Fleuret
37
2
0
01 Nov 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
32
10
0
05 Oct 2023
SplitEE: Early Exit in Deep Neural Networks with Split Computing
Divya J. Bajpai
Vivek K. Trivedi
S. L. Yadav
M. Hanawal
28
5
0
17 Sep 2023
Using Early Exits for Fast Inference in Automatic Modulation Classification
E. Mohammed
Omar Mashaal
H. Abou-zeid
21
3
0
22 Aug 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
26
12
0
21 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
33
38
0
19 May 2023
MoT: Memory-of-Thought Enables ChatGPT to Self-Improve
Xiaonan Li
Xipeng Qiu
ReLM
KELM
LRM
AI4MH
26
32
0
09 May 2023
Revisiting Single-gated Mixtures of Experts
Amelie Royer
I. Karmanov
Andrii Skliar
B. Bejnordi
Tijmen Blankevoort
MoE
MoMe
38
6
0
11 Apr 2023
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes
Daniel Fernando Campos
Alexandre Marques
Mark Kurtz
Chengxiang Zhai
VLM
AAML
13
2
0
30 Mar 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
22
193
0
14 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
101
0
27 Feb 2023
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction
Yachen Yan
Liubo Li
14
3
0
06 Jan 2023
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
37
8
0
15 Dec 2022
Vision Transformer Computation and Resilience for Dynamic Inference
Kavya Sreedhar
Jason Clemons
Rangharajan Venkatesan
S. Keckler
M. Horowitz
26
2
0
06 Dec 2022
Understanding the Robustness of Multi-Exit Models under Common Corruptions
Akshay Mehra
Skyler Seto
Navdeep Jaitly
B. Theobald
AAML
16
3
0
03 Dec 2022
Can Open-Domain QA Reader Utilize External Knowledge Efficiently like Humans?
Neeraj Varshney
Man Luo
Chitta Baral
RALM
21
11
0
23 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
30
31
0
21 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
24
1
0
16 Nov 2022
Avoid Overthinking in Self-Supervised Models for Speech Recognition
Dan Berrebbi
Brian Yan
Shinji Watanabe
LRM
23
4
0
01 Nov 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
47
4
0
27 Oct 2022
ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation
Junyi Li
Tianyi Tang
Wayne Xin Zhao
J. Nie
Ji-Rong Wen
22
17
0
24 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
35
16
0
18 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
36
16
0
14 Oct 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
42
8
0
27 Sep 2022
Unsupervised Early Exit in DNNs with Multiple Exits
U. HariNarayanN
M. Hanawal
Avinash Bhardwaj
29
10
0
20 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
30
109
0
31 Aug 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
18
1
0
31 Jul 2022
Answer Fast: Accelerating BERT on the Tensor Streaming Processor
I. Ahmed
Sahil Parmar
Matthew Boyd
Michael Beidler
Kris Kang
Bill Liu
Kyle Roach
John Kim
D. Abts
LLMAG
20
6
0
22 Jun 2022
Certified Error Control of Candidate Set Pruning for Two-Stage Relevance Ranking
Minghan Li
Xinyu Crystina Zhang
Ji Xin
Hongyang R. Zhang
Jimmy J. Lin
38
6
0
19 May 2022
PALBERT: Teaching ALBERT to Ponder
Nikita Balagansky
Daniil Gavrilov
MoE
26
6
0
07 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
9
177
0
01 Apr 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
1
2
Next