ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.01526
  4. Cited By
MViTv2: Improved Multiscale Vision Transformers for Classification and
  Detection

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection

2 December 2021
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "MViTv2: Improved Multiscale Vision Transformers for Classification and Detection"

50 / 398 papers shown
Title
Towards Privacy-Supporting Fall Detection via Deep Unsupervised
  RGB2Depth Adaptation
Towards Privacy-Supporting Fall Detection via Deep Unsupervised RGB2Depth Adaptation
Hejun Xiao
Kunyu Peng
Xiangsheng Huang
Alina Roitberg
Hao Li
Zhao Wang
Rainer Stiefelhagen
18
3
0
23 Aug 2023
How Much Temporal Long-Term Context is Needed for Action Segmentation?
How Much Temporal Long-Term Context is Needed for Action Segmentation?
Emad Bahrami Rad
Gianpiero Francesca
Juergen Gall
ViT
24
25
0
22 Aug 2023
TeD-SPAD: Temporal Distinctiveness for Self-supervised
  Privacy-preservation for video Anomaly Detection
TeD-SPAD: Temporal Distinctiveness for Self-supervised Privacy-preservation for video Anomaly Detection
Joe Fioresi
I. Dave
M. Shah
39
18
0
21 Aug 2023
Spatial-Temporal Alignment Network for Action Recognition
Spatial-Temporal Alignment Network for Action Recognition
Jinhui Ye
Junwei Liang
3DPC
29
1
0
19 Aug 2023
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Vision Backbone Enhancement via Multi-Stage Cross-Scale Attention
Liang Shang
Yanli Liu
Zhengyang Lou
Shuxue Quan
N. Adluru
Bochen Guan
W. Sethares
24
2
0
10 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
35
9
0
10 Aug 2023
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Prune Spatio-temporal Tokens by Semantic-aware Temporal Accumulation
Shuangrui Ding
Peisen Zhao
Xiaopeng Zhang
Rui Qian
H. Xiong
Qi Tian
ViT
29
16
0
08 Aug 2023
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
SSTFormer: Bridging Spiking Neural Network and Memory Support Transformer for Frame-Event based Recognition
Tianlin Li
Zong-Yao Wu
Yao Rong
Lin Zhu
Bowei Jiang
Jin Tang
Yonghong Tian
ViT
74
17
0
08 Aug 2023
DiT: Efficient Vision Transformers with Dynamic Token Routing
DiT: Efficient Vision Transformers with Dynamic Token Routing
Yuchen Ma
Zhengcong Fei
Junshi Huang
ViT
26
2
0
07 Aug 2023
A Hybrid CNN-Transformer Architecture with Frequency Domain Contrastive
  Learning for Image Deraining
A Hybrid CNN-Transformer Architecture with Frequency Domain Contrastive Learning for Image Deraining
Cheng-i Wang
Wei Li
39
0
0
07 Aug 2023
M2Former: Multi-Scale Patch Selection for Fine-Grained Visual
  Recognition
M2Former: Multi-Scale Patch Selection for Fine-Grained Visual Recognition
Ji-Hee Moon
Junseok K. Lee
Yu-Ling Lee
Seongsik Park
35
4
0
04 Aug 2023
Revisiting DETR Pre-training for Object Detection
Revisiting DETR Pre-training for Object Detection
Yan Ma
Weicong Liang
Bo-Ying Chen
Yiduo Hao
Bojian Hou
Xiangyu Yue
Chao Zhang
Yuhui Yuan
VLM
ViT
35
4
0
02 Aug 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature
  Restoration
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
37
7
0
27 Jul 2023
IML-ViT: Benchmarking Image Manipulation Localization by Vision
  Transformer
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
Xiaochen Ma
Bo Du
Zhuohang Jiang
Ahmed Y. Al Hammadi
Jizhe Zhou
16
7
0
27 Jul 2023
Causal reasoning in typical computer vision tasks
Causal reasoning in typical computer vision tasks
Kexuan Zhang
Qiyu Sun
Chaoqiang Zhao
Yang Tang
CML
26
11
0
26 Jul 2023
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Cheng Han
Qifan Wang
Yiming Cui
Zhiwen Cao
Wenguan Wang
Siyuan Qi
Dongfang Liu
VPVLM
VLM
25
47
0
25 Jul 2023
FlexiAST: Flexibility is What AST Needs
FlexiAST: Flexibility is What AST Needs
Jiu Feng
Mehmet Hamza Erol
Joon Son Chung
Arda Senocak
23
3
0
18 Jul 2023
RepViT: Revisiting Mobile CNN From ViT Perspective
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang
Hui Chen
Zijia Lin
Hengjun Pu
Guiguang Ding
34
177
0
18 Jul 2023
The Effects of Mixed Sample Data Augmentation are Class Dependent
The Effects of Mixed Sample Data Augmentation are Class Dependent
Haeil Lee
Han S. Lee
Junmo Kim
37
1
0
18 Jul 2023
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Kumar Ashutosh
Santhosh Kumar Ramakrishnan
Triantafyllos Afouras
Kristen Grauman
26
24
0
17 Jul 2023
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Multiscale Memory Comparator Transformer for Few-Shot Video Segmentation
Mennatullah Siam
R. Karim
Henghui Zhao
Richard P. Wildes
VOS
38
2
0
15 Jul 2023
Multimodal Distillation for Egocentric Action Recognition
Multimodal Distillation for Egocentric Action Recognition
Gorjan Radevski
Dusan Grujicic
Marie-Francine Moens
Matthew Blaschko
Tinne Tuytelaars
EgoV
23
23
0
14 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
54
19
0
13 Jul 2023
A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100
  Unsupervised Domain Adaptation Challenge for Action Recognition 2023
A Study on Differentiable Logic and LLMs for EPIC-KITCHENS-100 Unsupervised Domain Adaptation Challenge for Action Recognition 2023
Yi Cheng
Ziwei Xu
Fen Fang
Dongyun Lin
Hehe Fan
Yongkang Wong
Ying Sun
Mohan S. Kankanhalli
26
0
0
13 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
23
105
0
12 Jul 2023
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly
  Knowledge Understanding
HA-ViD: A Human Assembly Video Dataset for Comprehensive Assembly Knowledge Understanding
Hao Zheng
R. Lee
Yuqian Lu
VGen
17
16
0
09 Jul 2023
Efficient Online Processing with Deep Neural Networks
Efficient Online Processing with Deep Neural Networks
Lukas Hedegaard
23
0
0
23 Jun 2023
How can objects help action recognition?
How can objects help action recognition?
Xingyi Zhou
Anurag Arnab
Chen Sun
Cordelia Schmid
35
14
0
20 Jun 2023
PaReprop: Fast Parallelized Reversible Backpropagation
PaReprop: Fast Parallelized Reversible Backpropagation
Tyler Lixuan Zhu
K. Mangalam
17
1
0
15 Jun 2023
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in
  Vision Transformers
Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers
Dominick Reilly
Aman Chadha
Srijan Das
ViT
25
4
0
15 Jun 2023
E2E-LOAD: End-to-End Long-form Online Action Detection
E2E-LOAD: End-to-End Long-form Online Action Detection
Shuyuan Cao
Weihua Luo
Bairui Wang
Wei Emma Zhang
Lin Ma
30
5
0
13 Jun 2023
Mitigating Transformer Overconfidence via Lipschitz Regularization
Mitigating Transformer Overconfidence via Lipschitz Regularization
Wenqian Ye
Yunsheng Ma
Xu Cao
Kun Tang
23
13
0
12 Jun 2023
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient
  Vision Transformer
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
Haoran You
Huihong Shi
Yipin Guo
Yingyan Lin
Lin
34
16
0
10 Jun 2023
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order
  Learning
MoviePuzzle: Visual Narrative Reasoning through Multimodal Order Learning
Jianghui Wang
Yuxuan Wang
Dongyan Zhao
Zilong Zheng
46
1
0
04 Jun 2023
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Collect-and-Distribute Transformer for 3D Point Cloud Analysis
Haibo Qiu
Baosheng Yu
Dacheng Tao
3DPC
ViT
27
6
0
02 Jun 2023
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural
  Stochastic Differential Equations
Neural Ideal Large Eddy Simulation: Modeling Turbulence with Neural Stochastic Differential Equations
Anudhyan Boral
Z. Y. Wan
Leonardo Zepeda-Núnez
James Lottes
Qing Wang
Yi-fan Chen
John R. Anderson
Fei Sha
AI4CE
PINN
27
11
0
01 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
45
160
0
01 Jun 2023
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and
  Behavior Understanding
MammalNet: A Large-scale Video Benchmark for Mammal Recognition and Behavior Understanding
Jun Chen
Ming Hu
D. Coker
M. Berumen
Blair R. Costelloe
Sara Beery
Anna Rohrbach
Mohamed Elhoseiny
35
22
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Zheng-Hua Tan
28
7
0
01 Jun 2023
Vision Transformers for Mobile Applications: A Short Survey
Vision Transformers for Mobile Applications: A Short Survey
Nahid Alam
Steven Kolawole
S. Sethi
Nishant Bansali
Karina Nguyen
ViT
31
3
0
30 May 2023
Making Vision Transformers Truly Shift-Equivariant
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez
Teck-Yian Lim
Minh N. Do
Raymond A. Yeh
ViT
36
7
0
25 May 2023
Cross-view Action Recognition Understanding From Exocentric to
  Egocentric Perspective
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong
Khoa Luu
EgoV
29
10
0
25 May 2023
Slovo: Russian Sign Language Dataset
Slovo: Russian Sign Language Dataset
A. Kapitanov
Karina Kvanchiani
A.M. Nagaev
Elizaveta Petrova
SLR
13
10
0
23 May 2023
Enhancing Next Active Object-based Egocentric Action Anticipation with
  Guided Attention
Enhancing Next Active Object-based Egocentric Action Anticipation with Guided Attention
Sanket Thakur
Cigdem Beyan
Pietro Morerio
Vittorio Murino
Alessio Del Bue
35
6
0
22 May 2023
Learning Sequence Descriptor based on Spatio-Temporal Attention for
  Visual Place Recognition
Learning Sequence Descriptor based on Spatio-Temporal Attention for Visual Place Recognition
Junqiao Zhao
Fenglin Zhang
Yingfeng Cai
Geng Tian
Wenjie Mu
Chen Ye
Tiantian Feng
23
4
0
19 May 2023
Laughing Matters: Introducing Laughing-Face Generation using Diffusion
  Models
Laughing Matters: Introducing Laughing-Face Generation using Diffusion Models
Antoni Bigata Casademunt
Rodrigo Mira
Nikita Drobyshev
Konstantinos Vougioukas
Stavros Petridis
M. Pantic
DiffM
64
2
0
15 May 2023
CEMFormer: Learning to Predict Driver Intentions from In-Cabin and
  External Cameras via Spatial-Temporal Transformers
CEMFormer: Learning to Predict Driver Intentions from In-Cabin and External Cameras via Spatial-Temporal Transformers
Yunsheng Ma
Wenqian Ye
Xu Cao
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Ziran Wang
43
11
0
13 May 2023
M$^2$DAR: Multi-View Multi-Scale Driver Action Recognition with Vision
  Transformer
M2^22DAR: Multi-View Multi-Scale Driver Action Recognition with Vision Transformer
Yunsheng Ma
Liangqi Yuan
Amr Abdelraouf
Kyungtae Han
Rohit Gupta
Zihao Li
Ziran Wang
109
9
0
13 May 2023
OneCAD: One Classifier for All image Datasets using multimodal learning
OneCAD: One Classifier for All image Datasets using multimodal learning
S. Wadekar
Eugenio Culurciello
40
0
0
11 May 2023
A Survey on the Robustness of Computer Vision Models against Common
  Corruptions
A Survey on the Robustness of Computer Vision Models against Common Corruptions
Shunxin Wang
Raymond N. J. Veldhuis
Christoph Brune
N. Strisciuglio
OOD
VLM
27
11
0
10 May 2023
Previous
12345678
Next