Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2201.09450
Cited By
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
24 January 2022
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"UniFormer: Unifying Convolution and Self-attention for Visual Recognition"
50 / 164 papers shown
Title
DIR-AS: Decoupling Individual Identification and Temporal Reasoning for Action Segmentation
Peiyao Wang
Haibin Ling
15
2
0
04 Apr 2023
Frame Flexible Network
Yitian Zhang
Yue Bai
Chang Liu
Huan Wang
Sheng R. Li
Yun Fu
11
4
0
26 Mar 2023
A Contrastive Learning Scheme with Transformer Innate Patches
S. Jyhne
Per-Arne Andersen
Morten Goodwin Olsen
ViT
24
0
0
26 Mar 2023
MECPformer: Multi-estimations Complementary Patch with CNN-Transformers for Weakly Supervised Semantic Segmentation
Chunmeng Liu
Guang-pu Li
Yao Shen
Ruiqi Wang
ViT
27
7
0
19 Mar 2023
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
19
11
0
17 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
19
18
0
09 Mar 2023
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu
Sheng-Kai Yin
Weizhen Qi
Xiaodong Wang
Zecheng Tang
Nan Duan
MLLM
LRM
39
614
0
08 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
40
14
0
06 Mar 2023
Efficiency 360: Efficient Vision Transformers
Badri N. Patro
Vijay Srinivas Agneeswaran
26
6
0
16 Feb 2023
Efficient Attention via Control Variates
Lin Zheng
Jianbo Yuan
Chong-Jun Wang
Lingpeng Kong
28
18
0
09 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
24
136
0
03 Feb 2023
Revisiting the Spatial and Temporal Modeling for Few-shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Yong-Jin Liu
B. Mu
ViT
14
33
0
19 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
22
51
0
05 Jan 2023
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
29
46
0
09 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
F. Khan
CLIP
VLM
26
148
0
06 Dec 2022
CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors
Junlin Hou
Jilan Xu
Nan Zhang
Yi Wang
Yuejie Zhang
X. Zhang
Rui Feng
18
2
0
26 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
27
106
0
17 Nov 2022
InternVideo-Ego4D: A Pack of Champion Solutions to Ego4D Challenges
Guo Chen
Sen Xing
Zhe Chen
Yi Wang
Kunchang Li
...
Hongjie Zhang
Tong Lu
Yali Wang
Liming Wang
Yu Qiao
35
46
0
17 Nov 2022
PatchBlender: A Motion Prior for Video Transformers
Gabriele Prato
Yale Song
Janarthanan Rajendran
R. Devon Hjelm
Neel Joshi
Sarath Chandar
ViT
19
0
0
11 Nov 2022
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li
Zedong Wang
Zicheng Liu
Cheng Tan
Haitao Lin
Di Wu
Zhiyuan Chen
Jiangbin Zheng
Stan Z. Li
26
56
0
07 Nov 2022
Efficient Image Super-Resolution using Vast-Receptive-Field Attention
Ling Zhou
Haoming Cai
Jinjin Gu
Zheyu Li
Yingqi Liu
Xiangyu Chen
Yu Qiao
Chao Dong
SupR
18
57
0
12 Oct 2022
CALIP: Zero-Shot Enhancement of CLIP with Parameter-free Attention
Ziyu Guo
Renrui Zhang
Longtian Qiu
Xianzheng Ma
Xupeng Miao
Xuming He
Bin Cui
VLM
AAML
59
109
0
28 Sep 2022
HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification
Xiangzuo Huo
Gang Sun
Sheng Tian
Yan Wang
Long Yu
Jun Long
Wendong Zhang
Aolun Li
28
100
0
21 Sep 2022
On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks
Hubert Leterme
K. Polisano
V. Perrier
Alahari Karteek
FAtt
38
2
0
19 Sep 2022
On the Surprising Effectiveness of Transformers in Low-Labeled Video Recognition
Farrukh Rahman
Ömer Mubarek
Z. Kira
ViT
12
2
0
15 Sep 2022
MAFormer: A Transformer Network with Multi-scale Attention Fusion for Visual Recognition
Y. Wang
H. Sun
Xiaodi Wang
Bin Zhang
Chaonan Li
Ying Xin
Baochang Zhang
Errui Ding
Shumin Han
ViT
23
9
0
31 Aug 2022
Exploring Resolution and Degradation Clues as Self-supervised Signal for Low Quality Object Detection
Ziteng Cui
Ying J. Zhu
Lin Gu
Guo-Jun Qi
Xiaoxiao Li
Renrui Zhang
Zenghui Zhang
Tatsuya Harada
29
19
0
05 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
25
313
0
04 Aug 2022
Next-ViT: Next Generation Vision Transformer for Efficient Deployment in Realistic Industrial Scenarios
Jiashi Li
Xin Xia
W. Li
Huixia Li
Xing Wang
Xuefeng Xiao
Rui Wang
Min Zheng
Xin Pan
ViT
17
149
0
12 Jul 2022
MVP: Robust Multi-View Practice for Driving Action Localization
Jingjie Shang
Kunchang Li
Kaibin Tian
Haisheng Su
Yangguang Li
29
3
0
05 Jul 2022
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning
Junting Pan
Ziyi Lin
Xiatian Zhu
Jing Shao
Hongsheng Li
19
190
0
27 Jun 2022
Parallel Pre-trained Transformers (PPT) for Synthetic Data-based Instance Segmentation
Ming Li
Jie Wu
Jin Cai
J. Qin
Yuxi Ren
Xu Xiao
Min Zheng
Rui Wang
X. Pan
ViT
11
2
0
22 Jun 2022
EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
Jiangning Zhang
Xiangtai Li
Yabiao Wang
Chengjie Wang
Yibo Yang
Yong Liu
Dacheng Tao
ViT
34
32
0
19 Jun 2022
Peripheral Vision Transformer
Juhong Min
Yucheng Zhao
Chong Luo
Minsu Cho
ViT
MDE
26
30
0
14 Jun 2022
You Only Need 90K Parameters to Adapt Light: A Light Weight Transformer for Image Enhancement and Exposure Correction
Ziteng Cui
Kunchang Li
Lin Gu
Sheng Su
Peng Gao
Zhengkai Jiang
Yu Qiao
Tatsuya Harada
ViT
79
129
0
30 May 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Renrui Zhang
Ziyu Guo
Rongyao Fang
Bingyan Zhao
Dong Wang
Yu Qiao
Hongsheng Li
Peng Gao
3DPC
178
244
0
28 May 2022
A Unified and Biologically-Plausible Relational Graph Representation of Vision Transformers
Yuzhong Chen
Yu Du
Zhe Xiao
Lin Zhao
Lu Zhang
...
Dajiang Zhu
Tuo Zhang
Xintao Hu
Tianming Liu
Xi Jiang
ViT
19
5
0
20 May 2022
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
43
542
0
17 May 2022
NTIRE 2022 Challenge on Efficient Super-Resolution: Methods and Results
Yawei Li
K. Zhang
Radu Timofte
Luc Van Gool
F. Kong
...
Deng-Guang Zhou
Kun Zeng
Han-Yuan Lin
Xinyu Chen
Jin-Tao Fang
SupR
36
77
0
11 May 2022
Deep fusion of gray level co-occurrence matrices for lung nodule classification
A. Saihood
Hossein Karshenas
A. Naghsh-Nilchi
20
14
0
10 May 2022
Activating More Pixels in Image Super-Resolution Transformer
Xiangyu Chen
Xintao Wang
Jiantao Zhou
Yu Qiao
Chao Dong
ViT
61
600
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
30
240
0
07 Apr 2022
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Mingfei Han
David Junhao Zhang
Yali Wang
Rui Yan
L. Yao
Xiaojun Chang
Yu Qiao
19
55
0
05 Apr 2022
Movie Genre Classification by Language Augmentation and Shot Sampling
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
Xin Miao
Jiayi Liu
Huayan Wang
VLM
CLIP
16
1
0
24 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Jiaming Zhang
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
23
297
0
09 Mar 2022
TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation
R. Liu
Kailun Yang
Alina Roitberg
Jiaming Zhang
Kunyu Peng
Huayao Liu
Yaonan Wang
Rainer Stiefelhagen
ViT
39
36
0
27 Feb 2022
Hilbert Flattening: a Locality-Preserving Matrix Unfolding Method for Visual Discrimination
Qingsong Zhao
Shuguang Dou
Zhipeng Zhou
Yangguang Li
Yin Wang
Yu Qiao
Cairong Zhao
20
3
0
21 Feb 2022
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
21
30
0
24 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
209
1,212
0
05 Oct 2021
Previous
1
2
3
4
Next