ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.00154
  4. Cited By
CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale
  Attention

CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention

31 July 2021
Wenxiao Wang
Lulian Yao
Long Chen
Binbin Lin
Deng Cai
Xiaofei He
Wei Liu
ArXivPDFHTML

Papers citing "CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention"

50 / 50 papers shown
Title
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition
Mamba-Adaptor: State Space Model Adaptor for Visual Recognition
Fei Xie
Jiahao Nie
Yujin Tang
W. Zhang
Hongshen Zhao
Mamba
13
0
0
19 May 2025
Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline
Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline
Xvyuan Liu
Xiangfei Qiu
Xingjian Wu
Zhengyu Li
Chenjuan Guo
Jiaxi Hu
Bin Yang
AI4TS
22
0
0
16 May 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
118
1
0
27 Feb 2025
Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Geometric Distortion Guided Transformer for Omnidirectional Image Super-Resolution
Cuixin Yang
Rongkang Dong
Jun Xiao
Cong Zhang
Kin-Man Lam
Fei Zhou
Guoping Qiu
97
1
0
17 Jan 2025
Breaking the Low-Rank Dilemma of Linear Attention
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
55
1
0
12 Nov 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
36
2
0
01 Sep 2024
Can Transformers Do Enumerative Geometry?
Can Transformers Do Enumerative Geometry?
Baran Hashemi
Roderic G. Corominas
Alessandro Giacchetto
46
2
0
27 Aug 2024
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang
Yulun Zhang
Fisher Yu
50
15
0
08 Jul 2024
Vision Transformer with Sparse Scan Prior
Vision Transformer with Sparse Scan Prior
Qihang Fan
Huaibo Huang
Mingrui Chen
Ran He
ViT
48
5
0
22 May 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
53
7
0
28 Mar 2024
ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution
  and Transformer Networks
ResoNet: Robust and Explainable ENSO Forecasts with Hybrid Convolution and Transformer Networks
Pumeng Lyu
Tao Tang
Fenghua Ling
Jing-Jia Luo
Niklas Boers
Wanli Ouyang
Lei Bai
30
5
0
16 Dec 2023
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
49
0
0
01 Dec 2023
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
49
36
0
30 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
39
4
0
10 Oct 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
39
2
0
10 Aug 2023
Dual Aggregation Transformer for Image Super-Resolution
Dual Aggregation Transformer for Image Super-Resolution
Zheng Chen
Yulun Zhang
Jinjin Gu
Lingyu Kong
Xiaokang Yang
Feng Yu
ViT
34
169
0
07 Aug 2023
PVG: Progressive Vision Graph for Vision Recognition
PVG: Progressive Vision Graph for Vision Recognition
Jiafu Wu
Jian Li
Jiangning Zhang
Boshen Zhang
M. Chi
Yabiao Wang
Chengjie Wang
ViT
38
13
0
01 Aug 2023
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
38
2
0
15 Jul 2023
Leveraging Cross-Utterance Context For ASR Decoding
Leveraging Cross-Utterance Context For ASR Decoding
Robert Flynn
Anton Ragni
33
1
0
29 Jun 2023
Lightweight Vision Transformer with Bidirectional Interaction
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
57
28
0
01 Jun 2023
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution
  Vision Transformer
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer
Xuanyao Chen
Zhijian Liu
Haotian Tang
Li Yi
Hang Zhao
Song Han
ViT
29
47
0
30 Mar 2023
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Cascaded Local Implicit Transformer for Arbitrary-Scale Super-Resolution
Haoming Chen
Yu-Syuan Xu
Minui Hong
Yi-Min Tsai
Hsien-Kai Kuo
Chun-Yi Lee
OffRL
39
46
0
29 Mar 2023
Vision Transformer with Quadrangle Attention
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
29
38
0
27 Mar 2023
AMD-HookNet for Glacier Front Segmentation
AMD-HookNet for Glacier Front Segmentation
Fei Wu
Nora Gourmelon
T. Seehaus
Jianlin Zhang
M. Braun
Andreas Maier
Vincent Christlein
24
9
0
06 Feb 2023
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
HDFormer: High-order Directed Transformer for 3D Human Pose Estimation
Hanyuan Chen
Ju He
Wangmeng Xiang
Zhi-Qi Cheng
Wen Liu
Han-Wen Liu
Bin Luo
Yifeng Geng
Xuansong Xie
ViT
31
31
0
03 Feb 2023
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and Speed
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
40
161
0
15 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
42
41
0
07 Dec 2022
Concealed Object Detection for Passive Millimeter-Wave Security Imaging
  Based on Task-Aligned Detection Transformer
Concealed Object Detection for Passive Millimeter-Wave Security Imaging Based on Task-Aligned Detection Transformer
Cheng Guo
Fei-hu Hu
Yan Hu
ViT
21
15
0
01 Dec 2022
Degenerate Swin to Win: Plain Window-based Transformer without
  Sophisticated Operations
Degenerate Swin to Win: Plain Window-based Transformer without Sophisticated Operations
Tan Yu
Ping Li
ViT
46
5
0
25 Nov 2022
Token Transformer: Can class token help window-based transformer build
  better long-range interactions?
Token Transformer: Can class token help window-based transformer build better long-range interactions?
Jia-ju Mao
Yuan Chang
Xuesong Yin
34
0
0
11 Nov 2022
Towards Efficient Adversarial Training on Vision Transformers
Towards Efficient Adversarial Training on Vision Transformers
Boxi Wu
Jindong Gu
Zhifeng Li
Deng Cai
Xiaofei He
Wei Liu
ViT
AAML
46
38
0
21 Jul 2022
Improving Semantic Segmentation in Transformers using Hierarchical
  Inter-Level Attention
Improving Semantic Segmentation in Transformers using Hierarchical Inter-Level Attention
Gary Leung
Jun Gao
Fangyin Wei
Sanja Fidler
21
3
0
05 Jul 2022
Learning Cross-Image Object Semantic Relation in Transformer for
  Few-Shot Fine-Grained Image Classification
Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification
Bo Zhang
Jiakang Yuan
Baopu Li
Tao Chen
Jiayuan Fan
Botian Shi
ViT
31
31
0
02 Jul 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary
  Algorithm
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
39
32
0
19 Jun 2022
The Devil is in the Labels: Noisy Label Correction for Robust Scene
  Graph Generation
The Devil is in the Labels: Noisy Label Correction for Robust Scene Graph Generation
Lin Li
Long Chen
Yifeng Huang
Zhimeng Zhang
Songyang Zhang
Jun Xiao
NoLa
36
72
0
07 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet Speed
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
26
348
0
02 Jun 2022
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
Muning Wen
J. Kuba
Runji Lin
Weinan Zhang
Ying Wen
Jun Wang
Yaodong Yang
26
179
0
30 May 2022
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
28
515
0
26 Apr 2022
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Deeper Insights into the Robustness of ViTs towards Common Corruptions
Rui Tian
Zuxuan Wu
Qi Dai
Han Hu
Yu-Gang Jiang
ViT
AAML
26
4
0
26 Apr 2022
VSA: Learning Varied-Size Window Attention in Vision Transformers
VSA: Learning Varied-Size Window Attention in Vision Transformers
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
22
53
0
18 Apr 2022
Improving Vision Transformers by Revisiting High-frequency Components
Improving Vision Transformers by Revisiting High-frequency Components
Jiawang Bai
Liuliang Yuan
Shutao Xia
Shuicheng Yan
Zhifeng Li
Wen Liu
ViT
16
90
0
03 Apr 2022
ScalableViT: Rethinking the Context-oriented Generalization of Vision
  Transformer
ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer
Rui Yang
Hailong Ma
Jie Wu
Yansong Tang
Xuefeng Xiao
Min Zheng
Xiu Li
ViT
19
53
0
21 Mar 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wen Liu
Yonghong Tian
Liuliang Yuan
3DPC
ViT
33
454
0
13 Mar 2022
Dynamic Group Transformer: A General Vision Transformer Backbone with
  Dynamic Group Attention
Dynamic Group Transformer: A General Vision Transformer Backbone with Dynamic Group Attention
Kai Liu
Tianyi Wu
Cong Liu
Guodong Guo
ViT
41
17
0
08 Mar 2022
BOAT: Bilateral Local Attention Vision Transformer
BOAT: Bilateral Local Attention Vision Transformer
Tan Yu
Gangming Zhao
Ping Li
Yizhou Yu
ViT
33
27
0
31 Jan 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
43
44
0
28 Jan 2022
Classification-Then-Grounding: Reformulating Video Scene Graphs as
  Temporal Bipartite Graphs
Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs
Kaifeng Gao
Long Chen
Yulei Niu
Jian Shao
Jun Xiao
15
29
0
08 Dec 2021
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
319
1,525
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
316
3,633
0
24 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
980
0
27 Jan 2021
1