Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
Telling Stories for Common Sense Zero-Shot Action Recognition
Shreyank N. Gowda
Carolina Scarton
LM&Ro
22
2
0
29 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
29
15
0
28 Sep 2023
Local Compressed Video Stream Learning for Generic Event Boundary Detection
Libo Zhang
Xin Gu
Congcong Li
Tiejian Luo
Hengrui Fan
23
3
0
27 Sep 2023
Advancements in 3D Lane Detection Using LiDAR Point Clouds: From Data Collection to Model Development
Runkai Zhao
Yuwen Heng
Heng Wang
Yuanda Gao
Shilei Liu
Changhao Yao
Jiawen Chen
Weidong (Tom) Cai
3DPC
18
3
0
24 Sep 2023
CINFormer: Transformer network with multi-stage CNN feature injection for surface defect segmentation
Xiaoheng Jiang
Kaiyi Guo
Yang Lu
Feng Yan
Hao Liu
Jiale Cao
Mingliang Xu
Dacheng Tao
MedIm
ViT
UQCV
18
1
0
22 Sep 2023
Revisiting Kernel Temporal Segmentation as an Adaptive Tokenizer for Long-form Video Understanding
Mohamed Afham
Satya Narayan Shukla
Omid Poursaeed
Pengchuan Zhang
Ashish Shah
Sernam Lim
VLM
24
2
0
20 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
Haodong Duan
Mingze Xu
Bing Shuai
Davide Modolo
Zhuowen Tu
Joseph Tighe
Alessandro Bergamo
ViT
32
1
0
20 Sep 2023
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan
Huaibo Huang
Mingrui Chen
Hongmin Liu
Ran He
ViT
38
75
0
20 Sep 2023
Selective Volume Mixup for Video Action Recognition
Yi Tan
Zhaofan Qiu
Y. Hao
Ting Yao
Xiangnan He
Tao Mei
ViT
28
2
0
18 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin
Summer Crawford
Kaleb Guillot
Yihe Zhang
Yan Chen
...
Tri Setiyono
B. Tubana
Lu Peng
Magdy A. Bayoumi
N. Tzeng
42
20
0
16 Sep 2023
SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
F. Wang
Mike Zheng Shou
ViT
34
14
0
15 Sep 2023
UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection
Jun Xiong
Peng Zhang
Chuanyue Li
Wei Huang
Yufei Zha
Tao You
ViT
30
2
0
15 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
21
18
0
14 Sep 2023
Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion
Peiran Xu
Yadong Mu
26
7
0
14 Sep 2023
Efficient Emotional Adaptation for Audio-Driven Talking-Head Generation
Yuan Gan
Zongxin Yang
Xihang Yue
Lingyun Sun
Yezhou Yang
25
57
0
10 Sep 2023
EgoPCA: A New Framework for Egocentric Hand-Object Interaction Understanding
Yue Xu
Yong-Lu Li
Zhemin Huang
Michael Xu Liu
Cewu Lu
Yu-Wing Tai
Chi-Keung Tang
EgoV
25
9
0
05 Sep 2023
A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking
Lorenzo Papa
Paolo Russo
Irene Amerini
Luping Zhou
25
42
0
05 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
21
24
0
04 Sep 2023
Self-Supervised Video Transformers for Isolated Sign Language Recognition
Marcelo Sandoval-Castaneda
Yanhong Li
D. Brentari
Karen Livescu
Gregory Shakhnarovich
SLR
20
2
0
02 Sep 2023
Transformers as Support Vector Machines
Davoud Ataee Tarzanagh
Yingcong Li
Christos Thrampoulidis
Samet Oymak
48
43
0
31 Aug 2023
PanoSwin: a Pano-style Swin Transformer for Panorama Understanding
Zhixin Ling
Zhen Xing
Xiangdong Zhou
Manliang Cao
G. Zhou
ViT
26
17
0
28 Aug 2023
Eventful Transformers: Leveraging Temporal Redundancy in Vision Transformers
Matthew Dutson
Yin Li
M. Gupta
ViT
35
8
0
25 Aug 2023
Unlocking Fine-Grained Details with Wavelet-based High-Frequency Enhancement in Transformers
Reza Azad
A. Kazerouni
Alaa Sulaiman
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Abin Jose
Dorit Merhof
ViT
MedIm
21
9
0
25 Aug 2023
Motion-Guided Masking for Spatiotemporal Representation Learning
D. Fan
Jue Wang
Shuai Liao
Yi Zhu
Vimal Bhat
H. Santos-Villalobos
M. Rohith
Xinyu Li
VGen
26
19
0
24 Aug 2023
MOFO: MOtion FOcused Self-Supervision for Video Understanding
Mona Ahmadian
Frank Guerin
Andrew Gilbert
34
2
0
23 Aug 2023
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
18
3
0
23 Aug 2023
TurboViT: Generating Fast Vision Transformers via Generative Architecture Search
Alexander Wong
Saad Abbasi
Saeejith Nair
ViT
27
1
0
22 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
11
25
0
22 Aug 2023
MGMAE: Motion Guided Masking for Video Masked Autoencoding
Bingkun Huang
Zhiyu Zhao
Guozhen Zhang
Yu Qiao
Limin Wang
28
30
0
21 Aug 2023
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye
Junwei Liang
3DPC
21
1
0
19 Aug 2023
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching
Jiazheng Xing
Mengmeng Wang
Yudi Ruan
Bofan Chen
Yaowei Guo
B. Mu
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
22
18
0
18 Aug 2023
Memory-and-Anticipation Transformer for Online Action Understanding
Jiahao Wang
Guo Chen
Yifei Huang
Liming Wang
Tong Lu
OffRL
56
37
0
15 Aug 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
19
2
0
10 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
28
9
0
10 Aug 2023
Seeing in Flowing: Adapting CLIP for Action Recognition with Motion Prompts Learning
Qianqian Wang
Junlong Du
Ke Yan
Shouhong Ding
VLM
35
17
0
09 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
24
16
0
08 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
S. Chaudhuri
Saumik Bhattacharya
25
3
0
07 Aug 2023
Deepfake Detection: A Comparative Analysis
Sohail Ahmed Khan
Duc-Tien Dang-Nguyen
34
2
0
07 Aug 2023
M2Former: Multi-Scale Patch Selection for Fine-Grained Visual Recognition
Ji-Hee Moon
Junseok K. Lee
Yu-Ling Lee
Seongsik Park
27
4
0
04 Aug 2023
Capturing Co-existing Distortions in User-Generated Content for No-reference Video Quality Assessment
Kun Yuan
Zishang Kong
Chuanchuan Zheng
Ming-Ting Sun
Xingsen Wen
ViT
27
14
0
31 Jul 2023
DPMix: Mixture of Depth and Point Cloud Video Experts for 4D Action Segmentation
Yue Zhang
Hehe Fan
Yi Yang
Mohan S. Kankanhalli
3DPC
16
1
0
31 Jul 2023
Select2Col: Leveraging Spatial-Temporal Importance of Semantic Information for Efficient Collaborative Perception
Yuntao Liu
Qian Huang
Rongpeng Li
Xianfu Chen
Zhifeng Zhao
Shuyuan Zhao
Yongdong Zhu
Honggang Zhang
21
11
0
31 Jul 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
37
7
0
27 Jul 2023
Pretrained Deep 2.5D Models for Efficient Predictive Modeling from Retinal OCT
T. Emre
Marzieh Oghbaie
A. Chakravarty
Antoine Rivail
Sophie Riedl
...
S. Sivaprasad
Daniel Rueckert
A. Lotery
U. Schmidt-Erfurth
Hrvoje Bogunović
MedIm
15
1
0
25 Jul 2023
Multiscale Video Pretraining for Long-Term Activity Forecasting
Reuben Tan
Matthias De Lange
Michael L. Iuzzolino
Bryan A. Plummer
Kate Saenko
Karl Ridgeway
Lorenzo Torresani
AI4TS
11
6
0
24 Jul 2023
Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition
Yao Liu
Gangfeng Cui
Jiahui Luo
Xiaojun Chang
L. Yao
ViT
3DPC
16
5
0
22 Jul 2023
Actor-agnostic Multi-label Action Recognition with Multi-modal Query
Anindya Mondal
Sauradip Nag
J. Prada
Xiatian Zhu
Anjan Dutta
21
9
0
20 Jul 2023
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
23
3
0
18 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Study of Vision Transformers for Covid-19 Detection from Chest X-rays
S. Angara
S. Thirunagaru
ViT
MedIm
16
1
0
17 Jul 2023
Previous
1
2
3
...
5
6
7
...
13
14
15
Next