Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
33
2
0
15 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
48
19
0
13 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
18
104
0
12 Jul 2023
EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video
Matthias De Lange
H. Eghbalzadeh
Reuben Tan
Michael L. Iuzzolino
Franziska Meier
Karl Ridgeway
EgoV
17
1
0
11 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
39
86
0
11 Jul 2023
Distill-SODA: Distilling Self-Supervised Vision Transformer for Source-Free Open-Set Domain Adaptation in Computational Pathology
Guillaume Vray
Devavrat Tomar
Jean-Philippe Thiran
Behzad Bozorgtabar
MedIm
26
0
0
10 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Yin Cui
...
Florian Schroff
Hartwig Adam
Ming Yang
Ting Liu
Boqing Gong
ELM
37
9
0
06 Jul 2023
MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers
Jakob Drachmann Havtorn
Amelie Royer
Tijmen Blankevoort
B. Bejnordi
30
8
0
05 Jul 2023
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
64
17
0
05 Jul 2023
Differentially Private Video Activity Recognition
Zelun Luo
Yuliang Zou
Yijin Yang
Zane Durante
De-An Huang
Zhiding Yu
Chaowei Xiao
L. Fei-Fei
Anima Anandkumar
PICV
29
3
0
27 Jun 2023
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
18
0
0
23 Jun 2023
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
35
14
0
20 Jun 2023
Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos
Md Zahid Hasan
Jiajing Chen
Jiyang Wang
Mohammed Shaiqur Rahman
Ameya Joshi
Senem Velipasalar
C. Hegde
Anuj Sharma
S. Sarkar
VLM
44
18
0
16 Jun 2023
PaReprop: Fast Parallelized Reversible Backpropagation
Tyler Lixuan Zhu
K. Mangalam
17
1
0
15 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
20
4
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
J. Liu
VLM
CLIP
22
8
0
15 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
18
7
0
13 Jun 2023
E2E-LOAD: End-to-End Long-form Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
25
5
0
13 Jun 2023
Boosting Breast Ultrasound Video Classification by the Guidance of Keyframe Feature Centers
AnLan Sun
Zhao Zhang
Meng Lei
Yuting Dai
Dong Wang
Liwei Wang
26
5
0
12 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
26
16
0
10 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
20
66
0
09 Jun 2023
A Large-Scale Analysis on Self-Supervised Video Representation Learning
Akash Kumar
Ashlesha Kumar
Vibhav Vineet
Y. S. Rawat
SSL
20
3
0
09 Jun 2023
Optimizing ViViT Training: Time and Memory Reduction for Action Recognition
Shreyank N. Gowda
Anurag Arnab
Jonathan Huang
ViT
18
4
0
07 Jun 2023
Inflated 3D Convolution-Transformer for Weakly-supervised Carotid Stenosis Grading with Ultrasound Videos
Xinrui Zhou
Yuhao Huang
Wufeng Xue
Xin Yang
Yuxin Zou
Qilong Ying
Yuanji Zhang
Jia Liu
Jie Jessie Ren
Dong Ni
ViT
MedIm
28
4
0
05 Jun 2023
Masked Autoencoder for Unsupervised Video Summarization
Minho Shim
Taeoh Kim
Jinhyung Kim
Dongyoon Wee
23
1
0
02 Jun 2023
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Haibo Qiu
Baosheng Yu
Dacheng Tao
3DPC
ViT
22
6
0
02 Jun 2023
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
Anudhyan Boral
Z. Y. Wan
Leonardo Zepeda-Núnez
James Lottes
Qing Wang
Yi-fan Chen
John R. Anderson
Fei Sha
AI4CE
PINN
19
11
0
01 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
43
159
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Z. Tan
20
7
0
01 Jun 2023
Humans in 4D: Reconstructing and Tracking Humans with Transformers
Shubham Goel
Georgios Pavlakos
Jathushan Rajasegaran
Angjoo Kanazawa
Jitendra Malik
3DH
33
177
0
31 May 2023
Primal-Attention: Self-attention through Asymmetric Kernel SVD in Primal Representation
Yingyi Chen
Qinghua Tao
F. Tonin
Johan A. K. Suykens
34
19
0
31 May 2023
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez
Teck-Yian Lim
Minh N. Do
Raymond A. Yeh
ViT
22
7
0
25 May 2023
Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification
Saisai Ding
Juncheng Li
Jun Wang
Shihui Ying
Jun Shi
ViT
MedIm
30
7
0
25 May 2023
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
27
10
0
25 May 2023
ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
J. Yao
Xinggang Wang
Shusheng Yang
Baoyuan Wang
ViT
30
57
0
24 May 2023
Slovo: Russian Sign Language Dataset
A. Kapitanov
Karina Kvanchiani
A.M. Nagaev
Elizaveta Petrova
SLR
13
10
0
23 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
94
76
0
22 May 2023
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
31
6
0
22 May 2023
Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition
Junqiao Zhao
Fenglin Zhang
Yingfeng Cai
Geng Tian
Wenjie Mu
Chen Ye
Tiantian Feng
15
4
0
19 May 2023
CageViT: Convolutional Activation Guided Efficient Vision Transformer
Hao Zheng
Jinbao Wang
Xiantong Zhen
H. Chen
Jingkuan Song
Feng Zheng
ViT
17
0
0
17 May 2023
Exploring Few-Shot Adaptation for Activity Recognition on Diverse Domains
Kunyu Peng
Di Wen
David Schneider
Jiaming Zhang
Kailun Yang
M. Sarfraz
Rainer Stiefelhagen
Alina Roitberg
28
2
0
15 May 2023
M
2
^2
2
DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Yunsheng Ma
Liangqi Yuan
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Zihao Li
Ziran Wang
104
9
0
13 May 2023
Lightweight Delivery Detection on Doorbell Cameras
Pirazh Khorramshahi
Zhe Wu
Tianchen Wang
Luke Deluccia
Hongcheng Wang
8
0
0
13 May 2023
MMG-Ego4D: Multi-Modal Generalization in Egocentric Action Recognition
Xinyu Gong
S. Mohan
Naina Dhingra
Jean-Charles Bazin
Yilei Li
Zhangyang Wang
Rakesh Ranjan
EgoV
54
17
0
12 May 2023
Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
Bolin Lai
Fiona Ryan
Wenqi Jia
Miao Liu
James M. Rehg
EgoV
24
8
0
06 May 2023
Reduction of Class Activation Uncertainty with Background Information
H. M. D. Kabir
21
9
0
05 May 2023
Modelling Spatio-Temporal Interactions for Compositional Action Recognition
Ramanathan Rajendiran
Debaditya Roy
Basura Fernando
43
1
0
04 May 2023
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
22
4
0
28 Apr 2023
TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression
Shengchao Chen
Ting Shu
Huani Zhao
Guo Zhong
Xunlai Chen
29
17
0
27 Apr 2023
SoGAR: Self-supervised Spatiotemporal Attention-based Social Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
29
8
0
27 Apr 2023
Previous
1
2
3
...
6
7
8
...
13
14
15
Next