ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.02034
  4. Cited By
DynamicViT: Efficient Vision Transformers with Dynamic Token
  Sparsification
v1v2 (latest)

DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

3 June 2021
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
    ViT
ArXiv (abs)PDFHTMLGithub (608★)

Papers citing "DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification"

50 / 444 papers shown
Title
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
264
15
0
02 Sep 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang
Dingkang Liang
Zichang Tan
Xiaoqing Ye
Cheng Zhang
Jingdong Wang
Xiang Bai
ViT
95
2
0
01 Sep 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
69
1
0
30 Aug 2024
Flexible Control in Symbolic Music Generation via Musical Metadata
Flexible Control in Symbolic Music Generation via Musical Metadata
Sangjun Han
Jiwon Ham
Chaeeun Lee
Heejin Kim
Soojong Do
Sihyuk Yi
Jun Seo
Seoyoon Kim
Yountae Jung
Woohyung Lim
74
0
0
28 Aug 2024
Hierarchical Graph Interaction Transformer with Dynamic Token Clustering
  for Camouflaged Object Detection
Hierarchical Graph Interaction Transformer with Dynamic Token Clustering for Camouflaged Object Detection
Siyuan Yao
Hao Sun
Tian-Zhu Xiang
Xiao Wang
Xiaochun Cao
75
12
0
27 Aug 2024
TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based
  Computing
TReX- Reusing Vision Transformer's Attention for Efficient Xbar-based Computing
Abhishek Moitra
Abhiroop Bhattacharjee
Youngeun Kim
Priyadarshini Panda
ViT
66
2
0
22 Aug 2024
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
Adaptive Layer Selection for Efficient Vision Transformer Fine-Tuning
Alessio Devoto
Federico Alvetreti
Jary Pomponi
P. Lorenzo
Pasquale Minervini
Simone Scardapane
94
3
0
16 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
84
1
0
13 Aug 2024
Token Compensator: Altering Inference Cost of Vision Transformer without
  Re-Tuning
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie
Yehui Tang
Jianyuan Guo
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
60
4
0
13 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLMMLLM
206
31
0
04 Aug 2024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu
Zhuoyang Zhang
Samir Khaki
Shang Yang
Haotian Tang
Chenfeng Xu
Kurt Keutzer
Song Han
SSeg
84
1
0
26 Jul 2024
Efficient Visual Transformer by Learnable Token Merging
Efficient Visual Transformer by Learnable Token Merging
Yancheng Wang
Yingzhen Yang
ViT
108
2
0
21 Jul 2024
Token-level Correlation-guided Compression for Efficient Multimodal
  Document Understanding
Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Renshan Zhang
Yibo Lyu
Rui Shao
Gongwei Chen
Weili Guan
Liqiang Nie
74
10
0
19 Jul 2024
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language
  Large Models
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju
Haicheng Wang
Haozhe Cheng
Xu Chen
Zhonghua Zhai
Weilin Huang
Jinsong Lan
Shuai Xiao
Bo Zheng
VLM
96
6
0
16 Jul 2024
TCFormer: Visual Recognition via Token Clustering Transformer
TCFormer: Visual Recognition via Token Clustering Transformer
Wang Zeng
Sheng Jin
Lumin Xu
Wentao Liu
Chao Qian
Wanli Ouyang
Ping Luo
Xiaogang Wang
74
5
0
16 Jul 2024
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit
  for Real-Time UAV Tracking
Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking
You Wu
Xucheng Wang
Dan Zeng
Hengzhou Ye
Xiaolan Xie
Qijun Zhao
Shuiwang Li
94
3
0
07 Jul 2024
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision
  Transformer
PECTP: Parameter-Efficient Cross-Task Prompts for Incremental Vision Transformer
Qian Feng
Hanbin Zhao
Chao Zhang
Jiahua Dong
Henghui Ding
Yu-Gang Jiang
Hui Qian
VLM
66
5
0
04 Jul 2024
Efficient Sparse Attention needs Adaptive Token Release
Efficient Sparse Attention needs Adaptive Token Release
Chaoran Zhang
Lixin Zou
Dan Luo
Min Tang
Xiangyang Luo
Zihao Li
Chenliang Li
98
5
0
02 Jul 2024
Pruning One More Token is Enough: Leveraging Latency-Workload
  Non-Linearities for Vision Transformers on the Edge
Pruning One More Token is Enough: Leveraging Latency-Workload Non-Linearities for Vision Transformers on the Edge
Nick Eliopoulos
Purvish Jajal
James Davis
Gaowen Liu
George K. Thiravathukal
Yung-Hsiang Lu
66
1
0
01 Jul 2024
Speeding Up Image Classifiers with Little Companions
Speeding Up Image Classifiers with Little Companions
Yang Liu
Kowshik Thopalli
Jayaraman J. Thiagarajan
VLM
65
0
0
24 Jun 2024
A Comprehensive Study of Structural Pruning for Vision Models
A Comprehensive Study of Structural Pruning for Vision Models
Haoling Li
Haoling Li
Mengqi Xue
Gongfan Fang
Sheng Zhou
Zunlei Feng
Huiqiong Wang
Mingli Song
Lechao Cheng
VLM
37
0
0
18 Jun 2024
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
ClawMachine: Learning to Fetch Visual Tokens for Referential Comprehension
Tianren Ma
Lingxi Xie
Yunjie Tian
Boyu Yang
Yuan Zhang
77
0
0
17 Jun 2024
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic
  Segmentation with Plain Vision Transformers
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi
Svetlana Orlova
Daan de Geus
Gijs Dubbelman
ViTFedML
66
5
0
14 Jun 2024
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Hector A. Valdez
Kyle Min
Subarna Tripathi
VLM
87
2
0
13 Jun 2024
DiTFastAttn: Attention Compression for Diffusion Transformer Models
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Zhihang Yuan
Pu Lu
Hanling Zhang
Xuefei Ning
Linfeng Zhang
Tianchen Zhao
Shengen Yan
Guohao Dai
Yu Wang
113
33
0
12 Jun 2024
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual
  Tracking
Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking
Xiangyang Yang
Dan Zeng
Xucheng Wang
You Wu
Hengzhou Ye
Qijun Zhao
Shuiwang Li
108
4
0
12 Jun 2024
Flextron: Many-in-One Flexible Large Language Model
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai
Saurav Muralidharan
Greg Heinrich
Hongxu Yin
Zhangyang Wang
Jan Kautz
Pavlo Molchanov
85
14
0
11 Jun 2024
Parameter-Inverted Image Pyramid Networks
Parameter-Inverted Image Pyramid Networks
Xizhou Zhu
Xue Yang
Zhaokai Wang
Hao Li
Wenhan Dou
Junqi Ge
Lewei Lu
Ping Luo
Jifeng Dai
77
0
0
06 Jun 2024
Focus on the Core: Efficient Attention via Pruned Token Compression for
  Document Classification
Focus on the Core: Efficient Attention via Pruned Token Compression for Document Classification
Jungmin Yun
Mihyeon Kim
Youngbin Kim
108
9
0
03 Jun 2024
You Only Need Less Attention at Each Stage in Vision Transformers
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang
Hanpeng Liu
Stephen Lin
Kun He
76
5
0
01 Jun 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Christos Sakaridis
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
N. Sebe
115
7
0
30 May 2024
Matryoshka Query Transformer for Large Vision-Language Models
Matryoshka Query Transformer for Large Vision-Language Models
Wenbo Hu
Zi-Yi Dou
Liunian Harold Li
Amita Kamath
Nanyun Peng
Kai-Wei Chang
MLLM
112
10
0
29 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
106
13
0
25 May 2024
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning
  and Inference
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu
Xuyang Liu
Liangtao Shi
Zunnan Xu
Siteng Huang
Yi Xin
Quanjun Yin
83
8
0
23 May 2024
Segformer++: Efficient Token-Merging Strategies for High-Resolution
  Semantic Segmentation
Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation
Daniel Kienzle
Marco Kantonis
Robin Schon
Rainer Lienhart
55
3
0
23 May 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
80
6
0
22 May 2024
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
77
3
0
22 May 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
108
58
0
17 May 2024
LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data
  Lottery Tickets
LOTUS: Improving Transformer Efficiency with Sparsity Pruning and Data Lottery Tickets
Ojasw Upadhyay
61
0
0
01 May 2024
Raformer: Redundancy-Aware Transformer for Video Wire Inpainting
Raformer: Redundancy-Aware Transformer for Video Wire Inpainting
Zhong Ji
Yimu Su
Yan Zhang
Jiacheng Hou
Yanwei Pang
Jungong Han
65
2
0
24 Apr 2024
Data-independent Module-aware Pruning for Hierarchical Vision
  Transformers
Data-independent Module-aware Pruning for Hierarchical Vision Transformers
Yang He
Qiufeng Wang
ViT
76
5
0
21 Apr 2024
Comprehensive Survey of Model Compression and Speed up for Vision
  Transformers
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Feiyang Chen
Ziqian Luo
Lisang Zhou
Xueting Pan
Ying Jiang
56
25
0
16 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
90
2
0
15 Apr 2024
Self-Selected Attention Span for Accelerating Large Language Model
  Inference
Self-Selected Attention Span for Accelerating Large Language Model Inference
Tian Jin
W. Yazar
Zifei Xu
Sayeh Sharify
Xin Eric Wang
LRM
49
1
0
14 Apr 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for
  Edge-Assisted Video Analytics
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
51
3
0
14 Apr 2024
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen
W. Hou
Salman Khan
Fahad Shahbaz Khan
VLMViT
94
15
0
11 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
99
26
0
10 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
133
101
0
08 Apr 2024
MLP Can Be A Good Transformer Learner
MLP Can Be A Good Transformer Learner
Sihao Lin
Pumeng Lyu
Dongrui Liu
Tao Tang
Xiaodan Liang
Andy Song
Xiaojun Chang
ViT
105
12
0
08 Apr 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
71
2
0
08 Apr 2024
Previous
123456789
Next