ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00112
  4. Cited By
Transformer in Transformer
v1v2v3 (latest)

Transformer in Transformer

27 February 2021
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
    ViT
ArXiv (abs)PDFHTMLGithub (4228★)

Papers citing "Transformer in Transformer"

50 / 558 papers shown
Title
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Chengcheng Wang
Wei He
Ying Nie
Jianyuan Guo
Chuanjian Liu
Kai Han
Yunhe Wang
ObjD
124
244
0
20 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
166
91
0
20 Sep 2023
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient
  Channels
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
F. Wang
Mike Zheng Shou
ViT
115
14
0
15 Sep 2023
Dynamic Spectrum Mixer for Visual Recognition
Dynamic Spectrum Mixer for Visual Recognition
Zhiqiang Hu
Tao Yu
58
3
0
13 Sep 2023
Interdisciplinary Fairness in Imbalanced Research Proposal Topic
  Inference: A Hierarchical Transformer-based Method with Selective
  Interpolation
Interdisciplinary Fairness in Imbalanced Research Proposal Topic Inference: A Hierarchical Transformer-based Method with Selective Interpolation
Meng Xiao
Min-Ying Wu
Ziyue Qiao
Yanjie Fu
Zhiyuan Ning
Yi Du
Yuanchun Zhou
108
9
0
04 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
93
27
0
04 Sep 2023
Refined Temporal Pyramidal Compression-and-Amplification Transformer for
  3D Human Pose Estimation
Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation
Hanbing Li
Wangmeng Xiang
Ju He
Zhi-Qi Cheng
Bin Luo
Yifeng Geng
Xuansong Xie
ViT
151
2
0
04 Sep 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
134
21
0
27 Aug 2023
Benchmarking Data Efficiency and Computational Efficiency of Temporal
  Action Localization Models
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models
Jan Warchocki
Teodor Oprescu
Yunhan Wang
Alexandru Damacus
Paul Misterka
Robert-Jan Bruintjes
A. Lengyel
Ombretta Strafforello
Jan van Gemert
38
2
0
24 Aug 2023
Learning Heavily-Degraded Prior for Underwater Object Detection
Learning Heavily-Degraded Prior for Underwater Object Detection
C. Fu
Xin-Yue Fan
Jiewen Xiao
Wanqi Yuan
Risheng Liu
Zhongxuan Luo
62
25
0
24 Aug 2023
SG-Former: Self-guided Transformer with Evolving Token Reallocation
SG-Former: Self-guided Transformer with Evolving Token Reallocation
Sucheng Ren
Xingyi Yang
Songhua Liu
Xinchao Wang
ViT
80
43
0
23 Aug 2023
SPANet: Frequency-balancing Token Mixer using Spectral Pooling
  Aggregation Modulation
SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Dong Hwan Kim
MoE
70
9
0
22 Aug 2023
Vision Transformer Pruning Via Matrix Decomposition
Vision Transformer Pruning Via Matrix Decomposition
Tianyi Sun
51
0
0
21 Aug 2023
Patch Is Not All You Need
Patch Is Not All You Need
Chang-bo Li
Jie Zhang
Yang Wei
Zhilong Ji
Jinfeng Bai
Shiguang Shan
ViT
69
2
0
21 Aug 2023
LDCSF: Local depth convolution-based Swim framework for classifying
  multi-label histopathology images
LDCSF: Local depth convolution-based Swim framework for classifying multi-label histopathology images
Liangrui Pan
Yutao Dou
Zhichao Feng
Liwen Xu
Shaoliang Peng
MedIm
46
3
0
21 Aug 2023
TransFace: Calibrating Transformer Training for Face Recognition from a
  Data-Centric Perspective
TransFace: Calibrating Transformer Training for Face Recognition from a Data-Centric Perspective
Jun Dan
Yang Liu
Haoyu Xie
Jiankang Deng
H. Xie
Xuansong Xie
Baigui Sun
ViT
104
25
0
20 Aug 2023
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D
  Human Pose Estimation
PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation
Han Liu
Ju He
Zhi-Qi Cheng
Wangmeng Xiang
Q. Yang
...
Gaoang Wang
Xueting Bao
Bin Luo
Yifeng Geng
Xuansong Xie
DiffM
62
20
0
18 Aug 2023
Diverse Cotraining Makes Strong Semi-Supervised Segmentor
Diverse Cotraining Makes Strong Semi-Supervised Segmentor
Yijiang Li
Xinjiang Wang
Lihe Yang
Xue Jiang
Wayne Zhang
Ying Gao
89
19
0
18 Aug 2023
Computer vision-enriched discrete choice models, with an application to
  residential location choice
Computer vision-enriched discrete choice models, with an application to residential location choice
Sander van Cranenburgh
Francisco Garrido-Valenzuela
57
2
0
16 Aug 2023
Low-Light Image Enhancement with Illumination-Aware Gamma Correction and
  Complete Image Modelling Network
Low-Light Image Enhancement with Illumination-Aware Gamma Correction and Complete Image Modelling Network
Yinglong Wang
Ziqiang Liu
Jianzhuang Liu
Songcen Xu
Shuaicheng Liu
3DV
62
28
0
16 Aug 2023
AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary
  Pedestrian Attributes
AttMOT: Improving Multiple-Object Tracking by Introducing Auxiliary Pedestrian Attributes
Yunhao Li
Zhen Xiao
Ling Yang
Dan Meng
Xin Zhou
Hengrui Fan
Libo Zhang
77
5
0
15 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings
  for Video Action Recognition
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
S. Chaudhuri
Saumik Bhattacharya
80
3
0
07 Aug 2023
DiT: Efficient Vision Transformers with Dynamic Token Routing
DiT: Efficient Vision Transformers with Dynamic Token Routing
Yuchen Ma
Zhengcong Fei
Junshi Huang
ViT
57
2
0
07 Aug 2023
Novel Physics-Based Machine-Learning Models for Indoor Air Quality
  Approximations
Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations
Ahmad Mohammadshirazi
Aida Nadafian
Amin Karimi Monsefi
M. Rafiei
R. Ramnath
AI4CE
52
3
0
02 Aug 2023
ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data
ViT2EEG: Leveraging Hybrid Pretrained Vision Transformers for EEG Data
Ruiqi Yang
Eric Modesitt
ViT
88
12
0
01 Aug 2023
Visual Prompt Flexible-Modal Face Anti-Spoofing
Visual Prompt Flexible-Modal Face Anti-Spoofing
Zitong Yu
Rizhao Cai
Yawen Cui
Ajian Liu
Changsheng Chen
107
6
0
26 Jul 2023
Sparse then Prune: Toward Efficient Vision Transformers
Sparse then Prune: Toward Efficient Vision Transformers
Yogi Prasetyo
N. Yudistira
A. Widodo
VLMViT
60
2
0
22 Jul 2023
Efficient Beam Tree Recursion
Efficient Beam Tree Recursion
Jishnu Ray Chowdhury
Cornelia Caragea
81
3
0
20 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
96
11
0
18 Jul 2023
Complementary Frequency-Varying Awareness Network for Open-Set Fine-Grained Image Recognition
Complementary Frequency-Varying Awareness Network for Open-Set Fine-Grained Image Recognition
Qiulei Dong
Hong Wang
Qiulei Dong
106
0
0
14 Jul 2023
DEDUCE: Multi-head attention decoupled contrastive learning to discover
  cancer subtypes based on multi-omics data
DEDUCE: Multi-head attention decoupled contrastive learning to discover cancer subtypes based on multi-omics data
Liangrui Pan
Da Liu
Yutao Dou
Lian-min Wang
Zhichao Feng
Pengfei Rong
Liwen Xu
Shaoliang Peng
33
1
0
09 Jul 2023
Random Position Adversarial Patch for Vision Transformers
Random Position Adversarial Patch for Vision Transformers
Mingzhen Shao
ViTAAML
74
2
0
09 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
123
17
0
05 Jul 2023
Review of Large Vision Models and Visual Prompt Engineering
Review of Large Vision Models and Visual Prompt Engineering
Jiaqi Wang
Zheng Liu
Lin Zhao
Zihao Wu
Chong Ma
...
Bao Ge
Yixuan Yuan
Dinggang Shen
Tianming Liu
Shu Zhang
VLMLRM
157
163
0
03 Jul 2023
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series
  Anomaly Detection
ImDiffusion: Imputed Diffusion Models for Multivariate Time Series Anomaly Detection
Yuhang Chen
Chen Zhang
Minghua Ma
Yudong Liu
Ruomeng Ding
Yue Liu
Shilin He
Saravan Rajmohan
Qingwei Lin
Dongmei Zhang
DiffM
95
52
0
03 Jul 2023
X-MLP: A Patch Embedding-Free MLP Architecture for Vision
X-MLP: A Patch Embedding-Free MLP Architecture for Vision
Xinyue Wang
Zhicheng Cai
Chenglei Peng
ViT
90
5
0
02 Jul 2023
STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking
STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking
Yubo Cui
Zhiheng Li
Zheng Fang
3DPC
74
12
0
30 Jun 2023
ParameterNet: Parameters Are All You Need
ParameterNet: Parameters Are All You Need
Kai Han
Yunhe Wang
Jianyuan Guo
Enhua Wu
VLMAI4CE
75
31
0
26 Jun 2023
Unfolding Framework with Prior of Convolution-Transformer Mixture and
  Uncertainty Estimation for Video Snapshot Compressive Imaging
Unfolding Framework with Prior of Convolution-Transformer Mixture and Uncertainty Estimation for Video Snapshot Compressive Imaging
Siming Zheng
Xin Yuan
ViTMedIm
61
5
0
20 Jun 2023
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp
  Segmentation
SegT: A Novel Separated Edge-guidance Transformer Network for Polyp Segmentation
Feiyu Chen
Haiping Ma
Weijia Zhang
ViTMedIm
91
7
0
19 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
83
4
0
15 Jun 2023
Revisiting Token Pruning for Object Detection and Instance Segmentation
Revisiting Token Pruning for Object Detection and Instance Segmentation
Yifei Liu
Mathias Gehrig
Nico Messikommer
Marco Cannici
Davide Scaramuzza
ViTVLM
112
27
0
12 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViTMedIm
94
212
0
11 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
122
72
0
09 Jun 2023
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene
  Understanding
InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding
Hanrong Ye
Dan Xu
ViT
108
13
0
08 Jun 2023
Human-imperceptible, Machine-recognizable Images
Human-imperceptible, Machine-recognizable Images
Fusheng Hao
Fengxiang He
Yikai Wang
Fuxiang Wu
Jing Zhang
Jun Cheng
Dacheng Tao
AAML
80
2
0
06 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
102
0
0
02 Jun 2023
Learning Local to Global Feature Aggregation for Speech Emotion
  Recognition
Learning Local to Global Feature Aggregation for Speech Emotion Recognition
Cheng Lu
Hailun Lian
Wenming Zheng
Yuan Zong
Yan Zhao
Sunan Li
ViT
52
7
0
02 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
163
29
0
01 Jun 2023
Are Large Kernels Better Teachers than Transformers for ConvNets?
Are Large Kernels Better Teachers than Transformers for ConvNets?
Tianjin Huang
Lu Yin
Zhenyu Zhang
Lijuan Shen
Meng Fang
Mykola Pechenizkiy
Zhangyang Wang
Shiwei Liu
95
13
0
30 May 2023
Previous
12345...101112
Next