
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Junyan Li
Li Zhang
Jiahang Xu
Yujing Wang
Shaoguang Yan
Yunqing Xia
Yuqing Yang
Ting Cao
Hao Sun
Weiwei Deng
Qi Zhang
Mao Yang
Papers citing "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference"
6 / 6 papers shown