Constraint-aware and Ranking-distilled Token Pruning for Efficient
  Transformer Inference

Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference

Papers citing "Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference"