Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.00154
Cited By
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention
31 July 2021
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention"
50 / 50 papers shown
Title
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition
Fei Xie
Jiahao Nie
Yujin Tang
W. Zhang
Hongshen Zhao
Mamba
13
0
0
19 May 2025
Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline
Xvyuan Liu
Xiangfei Qiu
Xingjian Wu
Zhengyu Li
Chenjuan Guo
Jiaxi Hu
Bin Yang
AI4TS
22
0
0
16 May 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
118
1
0
27 Feb 2025
Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Cuixin Yang
Rongkang Dong
Jun Xiao
Cong Zhang
Kin-Man Lam
Fei Zhou
Guoping Qiu
97
1
0
17 Jan 2025
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
55
1
0
12 Nov 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
36
2
0
01 Sep 2024
Can Transformers Do Enumerative Geometry?
Baran Hashemi
Roderic G. Corominas
Alessandro Giacchetto
46
2
0
27 Aug 2024
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang
Yulun Zhang
Fisher Yu
50
15
0
08 Jul 2024
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
53
7
0
28 Mar 2024
ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks
Pumeng Lyu
Tao Tang
Fenghua Ling
Jing-Jia Luo
Niklas Boers
Wanli Ouyang
Lei Bai
30
5
0
16 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
49
0
0
01 Dec 2023
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
49
36
0
30 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
39
4
0
10 Oct 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
39
2
0
10 Aug 2023
Dual Aggregation Transformer for Image Super-Resolution
Zheng Chen
Yulun Zhang
Jinjin Gu
Lingyu Kong
Xiaokang Yang
Feng Yu
ViT
34
169
0
07 Aug 2023
PVG: Progressive Vision Graph for Vision Recognition
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
38
13
0
01 Aug 2023
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
38
2
0
15 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
33
1
0
29 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
57
28
0
01 Jun 2023
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Xuanyao Chen
Zhijian Liu
Haotian Tang
Li Yi
Hang Zhao
Song Han
ViT
29
47
0
30 Mar 2023
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Haoming Chen
Yu-Syuan Xu
Minui Hong
Yi-Min Tsai
Hsien-Kai Kuo
Chun-Yi Lee
OffRL
39
46
0
29 Mar 2023
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
29
38
0
27 Mar 2023
AMD-HookNet for Glacier Front Segmentation
Fei Wu
Nora Gourmelon
T. Seehaus
Jianlin Zhang
M. Braun
Andreas Maier
Vincent Christlein
24
9
0
06 Feb 2023
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Hanyuan Chen
Ju He
Wangmeng Xiang
Zhi-Qi Cheng
Wen Liu
Han-Wen Liu
Bin Luo
Yifeng Geng
Xuansong Xie
ViT
31
31
0
03 Feb 2023
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
40
161
0
15 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
42
41
0
07 Dec 2022
Concealed Object Detection for Passive Millimeter-Wave Security Imaging Based on Task-Aligned Detection Transformer
Cheng Guo
Fei-hu Hu
Yan Hu
ViT
21
15
0
01 Dec 2022
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
34
0
0
11 Nov 2022
Towards Efficient Adversarial Training on Vision Transformers
Boxi Wu
Jindong Gu
Zhifeng Li
Deng Cai
Xiaofei He
Wei Liu
ViT
AAML
46
38
0
21 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
21
3
0
05 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
31
31
0
02 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
39
32
0
19 Jun 2022
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Lin Li
Long Chen
Yifeng Huang
Zhimeng Zhang
Songyang Zhang
Jun Xiao
NoLa
36
72
0
07 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
26
348
0
02 Jun 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
Jun Wang
Yaodong Yang
26
179
0
30 May 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
28
515
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
26
4
0
26 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
22
53
0
18 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
Wen Liu
ViT
16
90
0
03 Apr 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
19
53
0
21 Mar 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wen Liu
Yonghong Tian
Liuliang Yuan
3DPC
ViT
33
454
0
13 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
41
17
0
08 Mar 2022
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
33
27
0
31 Jan 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
43
44
0
28 Jan 2022
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Kaifeng Gao
Long Chen
Yulei Niu
Jian Shao
Jun Xiao
15
29
0
08 Dec 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
319
1,525
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
316
3,633
0
24 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
980
0
27 Jan 2021
1