ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.09450
  4. Cited By
UniFormer: Unifying Convolution and Self-attention for Visual
  Recognition

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

24 January 2022
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
    ViT
ArXivPDFHTML

Papers citing "UniFormer: Unifying Convolution and Self-attention for Visual Recognition"

50 / 164 papers shown
Title
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
MFogHub: Bridging Multi-Regional and Multi-Satellite Data for Global Marine Fog Detection and Forecasting
Mengqiu Xu
Kaixin Chen
Heng Guo
Yixiang Huang
Ming Wu
Zhenwei Shi
Chuang Zhang
Jun Guo
29
0
0
15 May 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li
Ruoyi Du
Juncheng Yan
Le Zhuo
Zhen Li
Peng Gao
Zhanyu Ma
Ming-Ming Cheng
VLM
68
2
0
10 Apr 2025
Audio-visual Event Localization on Portrait Mode Short Videos
Audio-visual Event Localization on Portrait Mode Short Videos
Wuyang Liu
Yi Chai
Yongpeng Yan
Yanzhen Ren
21
0
0
09 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
82
0
0
03 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
42
0
0
31 Mar 2025
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
VTD-CLIP: Video-to-Text Discretization via Prompting CLIP
Wencheng Zhu
Yuexin Wang
Hongxuan Li
Pengfei Zhu
Q. Hu
CLIP
48
0
0
24 Mar 2025
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Stitch-a-Recipe: Video Demonstration from Multistep Descriptions
Chi Hsuan Wu
Kumar Ashutosh
Kristen Grauman
DiffM
63
0
0
18 Mar 2025
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning
Zhong Ji
Weilong Cao
Yan Zhang
Yanwei Pang
Jungong Han
X. Li
DiffM
VLM
47
0
0
06 Mar 2025
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
115
1
0
27 Feb 2025
InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model
InternVQA: Advancing Compressed Video QualityAssessment with Distilling Large Foundation Model
Fengbin Guan
Zihao Yu
Yiting Lu
Xin Li
Zhibo Chen
65
1
0
26 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
63
1
0
16 Feb 2025
Slicing Vision Transformer for Flexible Inference
Slicing Vision Transformer for Flexible Inference
Yitian Zhang
Huseyin Coskun
Xu Ma
Huan Wang
Ke Ma
Xi
Chen
Derek Hao Hu
Y. Fu
ViT
76
0
0
06 Dec 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
39
0
0
04 Nov 2024
UTSRMorph: A Unified Transformer and Superresolution Network for
  Unsupervised Medical Image Registration
UTSRMorph: A Unified Transformer and Superresolution Network for Unsupervised Medical Image Registration
Runshi Zhang
Hao Mo
Junchen Wang
Bimeng Jie
Yang He
Nenghao Jin
Liang Zhu
ViT
MedIm
28
3
0
27 Oct 2024
TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant
TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant
Guopeng Li
Qiang Wang
K. Yan
Shouhong Ding
Yuan Gao
Gui-Song Xia
38
0
0
16 Oct 2024
MoH: Multi-Head Attention as Mixture-of-Head Attention
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin
Bo Zhu
Li Yuan
Shuicheng Yan
MoE
31
13
0
15 Oct 2024
Transforming Game Play: A Comparative Study of DCQN and DTQN
  Architectures in Reinforcement Learning
Transforming Game Play: A Comparative Study of DCQN and DTQN Architectures in Reinforcement Learning
William A. Stigall
56
0
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
27
1
0
14 Oct 2024
Multi-modal Vision Pre-training for Medical Image Analysis
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui
Lingzhi Chen
Zhenyu Tang
Lilong Wang
M. Liu
S. Zhang
Xiaosong Wang
32
0
0
14 Oct 2024
Generating Intermediate Representations for Compositional Text-To-Image
  Generation
Generating Intermediate Representations for Compositional Text-To-Image Generation
Ran Galun
Sagie Benaim
23
0
0
13 Oct 2024
Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic
  Model
Polyp-SES: Automatic Polyp Segmentation with Self-Enriched Semantic Model
Quang Vinh Nguyen
Thanh Hoang Son Vo
Sae-Ryung Kang
Soo-Hyung Kim
29
0
0
02 Oct 2024
Progressive Representation Learning for Real-Time UAV Tracking
Progressive Representation Learning for Real-Time UAV Tracking
Changhong Fu
Xiang Lei
Haobo Zuo
L. Yao
Guangze Zheng
Jia-Yu Pan
AI4TS
32
4
0
25 Sep 2024
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Resolving Inconsistent Semantics in Multi-Dataset Image Segmentation
Qilong Zhangli
Di Liu
Abhishek Aich
Dimitris Metaxas
S. Schulter
33
0
0
15 Sep 2024
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision
  Mamba and Transformer Networks
SparX: A Sparse Cross-Layer Connection Mechanism for Hierarchical Vision Mamba and Transformer Networks
Meng Lou
Yunxiang Fu
Yizhou Yu
Mamba
55
5
0
15 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
28
1
0
05 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
29
0
0
02 Sep 2024
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via
  Mamba-Based Decoders
MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders
Baijiong Lin
Weisen Jiang
Pengguang Chen
Shu Liu
Ying-Cong Chen
Mamba
37
1
0
27 Aug 2024
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video
  Editing Quality Assessment
E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment
Shangkun Sun
Xiaoyu Liang
S. Fan
Wenxu Gao
Wei-Nan Gao
DiffM
56
0
0
21 Aug 2024
Flatten: Video Action Recognition is an Image Classification task
Flatten: Video Action Recognition is an Image Classification task
Junlin Chen
Chengcheng Xu
Yangfan Xu
Jian Yang
Jun Yu Li
Zhiping Shi
31
1
0
17 Aug 2024
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal
  Omni-Scale Feature Learning
OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning
Mushui Liu
Bozheng Li
Yunlong Yu
VLM
23
9
0
12 Aug 2024
e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence
  Training for Radiology Report Generation
e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation
Aaron Nicolson
Jinghui Liu
Jason Dowling
Anthony N. Nguyen
Bevan Koopman
36
3
0
07 Aug 2024
SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams
SwinSF: Image Reconstruction from Spatial-Temporal Spike Streams
Liangyan Jiang
Chuang Zhu
Yanxu Chen
50
2
0
22 Jul 2024
Improved Esophageal Varices Assessment from Non-Contrast CT Scans
Improved Esophageal Varices Assessment from Non-Contrast CT Scans
Chunli Li
Xiaoming Zhang
Yuan Gao
Xiaoli Yin
Le Lu
Ling Zhang
Ke Yan
Yu Shi
43
0
0
18 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
48
0
0
16 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
39
4
0
10 Jul 2024
PosMLP-Video: Spatial and Temporal Relative Position Encoding for
  Efficient Video Recognition
PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition
Y. Hao
Diansong Zhou
Zhicai Wang
Chong-Wah Ngo
Meng Wang
ViT
32
4
0
03 Jul 2024
Video Inpainting Localization with Contrastive Learning
Video Inpainting Localization with Contrastive Learning
Zijie Lou
Gang Cao
Man Lin
42
1
0
25 Jun 2024
SVFormer: A Direct Training Spiking Transformer for Efficient Video
  Action Recognition
SVFormer: A Direct Training Spiking Transformer for Efficient Video Action Recognition
Liutao Yu
Liwei Huang
Chenlin Zhou
Han Zhang
Zhengyu Ma
Huihui Zhou
Yonghong Tian
ViT
44
4
0
21 Jun 2024
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis
Md. Saiful Islam
Tariq Adnan
Jan Freyberg
Sangwu Lee
Abdelrahman Abdelkader
...
Cathe Schwartz
Karen Jaffe
Ruth B. Schneider
E. R. Dorsey
Ehsan Hoque
70
0
0
21 Jun 2024
Trusted Video Inpainting Localization via Deep Attentive Noise Learning
Trusted Video Inpainting Localization via Deep Attentive Noise Learning
Zijie Lou
Gang Cao
Man Lin
AAML
44
3
0
19 Jun 2024
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for
  Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere
M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere
Mengqiu Xu
Ming Wu
Kaixin Chen
Yixiang Huang
Mingrui Xu
...
Dongliang Chang
Zhenwei Shi
Chuang Zhang
Zhanyu Ma
Jun Guo
33
1
0
19 Jun 2024
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report
  Generation and How to Incorporate It
The Impact of Auxiliary Patient Data on Automated Chest X-Ray Report Generation and How to Incorporate It
Aaron Nicolson
Shengyao Zhuang
Jason Dowling
Bevan Koopman
34
1
0
19 Jun 2024
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to
  Remote Physiological Measurement
GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement
Hao Wang
E. Ahn
Jinman Kim
40
0
0
19 Jun 2024
ViG: Linear-complexity Visual Sequence Learning with Gated Linear
  Attention
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao
Xinggang Wang
Lianghui Zhu
Qian Zhang
Chang Huang
54
4
0
28 May 2024
The SkatingVerse Workshop & Challenge: Methods and Results
The SkatingVerse Workshop & Challenge: Methods and Results
Jian Zhao
Lei Jin
Jianshu Li
Zheng Zhu
Yinglei Teng
...
Shiníchi Satoh
Yandong Guo
Cewu Lu
Junliang Xing
Jane Shengmei Shen
AI4TS
30
0
0
27 May 2024
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic
  Hand Gesture Recognition
GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
SLR
ViT
36
2
0
18 May 2024
MambaOut: Do We Really Need Mamba for Vision?
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
45
48
0
13 May 2024
PromptCIR: Blind Compressed Image Restoration with Prompt Learning
PromptCIR: Blind Compressed Image Restoration with Prompt Learning
Bingchen Li
Xin Li
Yiting Lu
Ruoyu Feng
Mengxi Guo
Shijie Zhao
Li Zhang
Zhibo Chen
36
13
0
26 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
1234
Next