ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.07800
  4. Cited By
Not All Patches are What You Need: Expediting Vision Transformers via
  Token Reorganizations

Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations

16 February 2022
Youwei Liang
Chongjian Ge
Zhan Tong
Yibing Song
Jue Wang
P. Xie
    ViT
ArXivPDFHTML

Papers citing "Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations"

50 / 57 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
148
0
0
06 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xiaomeng Li
Guangliang Cheng
Mamba
87
0
0
01 May 2025
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
VCM: Vision Concept Modeling Based on Implicit Contrastive Learning with Vision-Language Instruction Fine-Tuning
Run Luo
Renke Shan
Longze Chen
Ziqiang Liu
Lu Wang
Min Yang
Xiaobo Xia
MLLM
VLM
99
0
0
28 Apr 2025
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Yuanbing Ouyang
Yizhuo Liang
Qingpeng Li
Xinfei Guo
Yiming Luo
Di Wu
Hao Wang
Yushan Pan
ViT
VLM
73
0
0
25 Apr 2025
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
DyDiT++: Dynamic Diffusion Transformers for Efficient Visual Generation
Wangbo Zhao
Yizeng Han
Jiasheng Tang
Kai Wang
Hao Luo
Yibing Song
Gao Huang
Fan Wang
Yang You
71
0
0
09 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Dynamic Pyramid Network for Efficient Multimodal Large Language Model
Hao Ai
Kunyi Wang
Zezhou Wang
H. Lu
Jin Tian
Yaxin Luo
Peng-Fei Xing
Jen-Yuan Huang
Huaxia Li
Gen Luo
MLLM
VLM
110
0
0
26 Mar 2025
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong
Junjun Jiang
Kui Jiang
Jiahan Li
Yongbing Zhang
45
0
0
28 Feb 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
109
3
0
05 Jan 2025
Training Noise Token Pruning
Training Noise Token Pruning
Mingxing Rao
Bohan Jiang
Daniel Moyer
ViT
74
0
0
27 Nov 2024
Breaking the Low-Rank Dilemma of Linear Attention
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
45
1
0
12 Nov 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Conghui He
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
48
26
0
22 Oct 2024
Token Turing Machines are Efficient Vision Models
Token Turing Machines are Efficient Vision Models
Purvish Jajal
Nick Eliopoulos
Benjamin Shiue-Hal Chou
George K. Thiravathukal
James C. Davis
Yung-Hsiang Lu
98
0
0
11 Sep 2024
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Brain-Inspired Stepwise Patch Merging for Vision Transformers
Yonghao Yu
Dongcheng Zhao
Guobin Shen
Yiting Dong
Yi Zeng
58
0
0
11 Sep 2024
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen
Tianxiang Hao
Tao He
Sicheng Zhao
Pengzhang Liu
Yongjun Bao
Guiguang Ding
Guiguang Ding
135
7
0
02 Sep 2024
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu
Qiyuan Chen
Wei Wei
Zheng Lin
Xianhao Chen
Kaibin Huang
42
43
0
09 Jul 2024
Key Patches Are All You Need: A Multiple Instance Learning Framework For
  Robust Medical Diagnosis
Key Patches Are All You Need: A Multiple Instance Learning Framework For Robust Medical Diagnosis
Diogo J. Araújo
M. R. Verdelho
Alceu Bissoto
Jacinto C. Nascimento
Carlos Santiago
Catarina Barata
32
1
0
02 May 2024
Arena: A Patch-of-Interest ViT Inference Acceleration System for
  Edge-Assisted Video Analytics
Arena: A Patch-of-Interest ViT Inference Acceleration System for Edge-Assisted Video Analytics
Haosong Peng
Wei Feng
Hao Li
Yufeng Zhan
Qihua Zhou
Yuanqing Xia
30
2
0
14 Apr 2024
Benchmarking the Robustness of Temporal Action Detection Models Against
  Temporal Corruptions
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng
Xiaoyong Chen
Jiaming Liang
Huisi Wu
Guangzhong Cao
Yong Guo
AAML
39
3
0
29 Mar 2024
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen
Peiyan Dong
Lei Lu
Zhenglun Kong
Zhengang Li
Ming Lin
Chao Wu
Yanzhi Wang
MQ
39
24
0
09 Dec 2023
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and
  Favorable Transferability For ViTs
CAIT: Triple-Win Compression towards High Accuracy, Fast Inference, and Favorable Transferability For ViTs
Ao Wang
Hui Chen
Zijia Lin
Sicheng Zhao
J. Han
Guiguang Ding
ViT
31
6
0
27 Sep 2023
ProPainter: Improving Propagation and Transformer for Video Inpainting
ProPainter: Improving Propagation and Transformer for Video Inpainting
Shangchen Zhou
Chongyi Li
Kelvin C. K. Chan
Chen Change Loy
ViT
35
90
0
07 Sep 2023
Exchanging-based Multimodal Fusion with Transformer
Exchanging-based Multimodal Fusion with Transformer
Renyu Zhu
Chengcheng Han
Yong Qian
Qiushi Sun
Xiang Li
Ming Gao
Xuezhi Cao
Yunsen Xian
38
2
0
05 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Do We Really Need a Large Number of Visual Prompts?
Do We Really Need a Large Number of Visual Prompts?
Youngeun Kim
Yuhang Li
Abhishek Moitra
Ruokai Yin
Priyadarshini Panda
VLM
VPVLM
40
5
0
26 May 2023
AutoFocusFormer: Image Segmentation off the Grid
AutoFocusFormer: Image Segmentation off the Grid
Chen Ziwen
K. Patnaik
Shuangfei Zhai
Alvin Wan
Zhile Ren
A. Schwing
Alex Colburn
Li Fuxin
22
9
0
24 Apr 2023
Efficient Video Action Detection with Token Dropout and Context
  Refinement
Efficient Video Action Detection with Token Dropout and Context Refinement
Lei Chen
Zhan Tong
Yibing Song
Gangshan Wu
Limin Wang
36
14
0
17 Apr 2023
Learning Dynamic Style Kernels for Artistic Style Transfer
Learning Dynamic Style Kernels for Artistic Style Transfer
Wenju Xu
Chengjiang Long
Yongwei Nie
23
14
0
02 Apr 2023
APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud
  Understanding
APPT : Asymmetric Parallel Point Transformer for 3D Point Cloud Understanding
Hengjia Li
Tu Zheng
Zhihao Chi
Zheng Yang
Wenxiao Wang
Boxi Wu
Binbin Lin
Deng Cai
3DPC
48
1
0
31 Mar 2023
Task-specific Fine-tuning via Variational Information Bottleneck for
  Weakly-supervised Pathology Whole Slide Image Classification
Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
Honglin Li
Chenglu Zhu
Yunlong Zhang
Yuxuan Sun
Zhongyi Shui
Wenwei Kuang
S. Zheng
L. Yang
66
57
0
15 Mar 2023
Human MotionFormer: Transferring Human Motions with Vision Transformers
Human MotionFormer: Transferring Human Motions with Vision Transformers
Hongyu Liu
Xintong Han
Chengbin Jin
Lihui Qian
Huawei Wei
...
Faqiang Wang
Haoye Dong
Yibing Song
Jia Xu
Qifeng Chen
16
10
0
22 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Simple Hardware-Efficient Long Convolutions for Sequence Modeling
Daniel Y. Fu
Elliot L. Epstein
Eric N. D. Nguyen
A. Thomas
Michael Zhang
Tri Dao
Atri Rudra
Christopher Ré
16
52
0
13 Feb 2023
A Theoretical Understanding of Shallow Vision Transformers: Learning,
  Generalization, and Sample Complexity
A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity
Hongkang Li
Hao Wu
Sijia Liu
Pin-Yu Chen
ViT
MLT
37
57
0
12 Feb 2023
SMMix: Self-Motivated Image Mixing for Vision Transformers
SMMix: Self-Motivated Image Mixing for Vision Transformers
Mengzhao Chen
Mingbao Lin
Zhihang Lin
Yu-xin Zhang
Rongrong Ji
Rongrong Ji
53
10
0
26 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and
  Training Efficiency via Efficient Data Sampling and Routing
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
21
24
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
Dynamic Feature Pruning and Consolidation for Occluded Person
  Re-Identification
Dynamic Feature Pruning and Consolidation for Occluded Person Re-Identification
Yuteng Ye
Hang Zhou
Jiale Cai
Chenxing Gao
Youjia Zhang
Junle Wang
Qiang Hu
Junqing Yu
Wei Yang
23
6
0
27 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
31
15
0
21 Nov 2022
Beyond Attentive Tokens: Incorporating Token Importance and Diversity
  for Efficient Vision Transformers
Beyond Attentive Tokens: Incorporating Token Importance and Diversity for Efficient Vision Transformers
Sifan Long
Z. Zhao
Jimin Pi
Sheng-sheng Wang
Jingdong Wang
22
29
0
21 Nov 2022
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating
  Unified Vision Language Model
You Need Multiple Exiting: Dynamic Early Exiting for Accelerating Unified Vision Language Model
Sheng Tang
Yaqing Wang
Zhenglun Kong
Tianchi Zhang
Yao Li
Caiwen Ding
Yanzhi Wang
Yi Liang
Dongkuan Xu
25
31
0
21 Nov 2022
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
TORE: Token Reduction for Efficient Human Mesh Recovery with Transformer
Zhiyang Dou
Qingxuan Wu
Chu-Hsing Lin
Zeyu Cao
Qiangqiang Wu
Weilin Wan
Taku Komura
Wenping Wang
24
39
0
19 Nov 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
48
417
0
17 Oct 2022
One Model to Edit Them All: Free-Form Text-Driven Image Manipulation
  with Semantic Modulations
One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations
Yi-Chun Zhu
Hongyu Liu
Yibing Song
Ziyang Yuan
Xintong Han
Chun Yuan
Qifeng Chen
Jue Wang
VLM
DiffM
34
30
0
14 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural
  Networks on Small Datasets
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets
Zhiying Lu
Hongtao Xie
Chuanbin Liu
Yongdong Zhang
ViT
25
57
0
12 Oct 2022
Dilated Neighborhood Attention Transformer
Dilated Neighborhood Attention Transformer
Ali Hassani
Humphrey Shi
ViT
MedIm
33
68
0
29 Sep 2022
Graph Reasoning Transformer for Image Parsing
Graph Reasoning Transformer for Image Parsing
Dong Zhang
Jinhui Tang
Kwang-Ting Cheng
ViT
24
16
0
20 Sep 2022
Super Vision Transformer
Super Vision Transformer
Mingbao Lin
Mengzhao Chen
Yu-xin Zhang
Yunhang Shen
Rongrong Ji
Liujuan Cao
ViT
43
20
0
23 May 2022
Compact Model Training by Low-Rank Projection with Energy Transfer
Compact Model Training by Low-Rank Projection with Energy Transfer
K. Guo
Zhenquan Lin
Xiaofen Xing
Fang Liu
Xiangmin Xu
27
2
0
12 Apr 2022
Dynamic Focus-aware Positional Queries for Semantic Segmentation
Dynamic Focus-aware Positional Queries for Semantic Segmentation
Haoyu He
Jianfei Cai
Zizheng Pan
Jing Liu
Jing Zhang
Dacheng Tao
Bohan Zhuang
34
17
0
04 Apr 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
43
44
0
28 Jan 2022
12
Next