ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

50 / 736 papers shown
Title
RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose
RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose
Tao Jiang
Peng Lu
Li Zhang
Ning Ma
Rui Han
Chengqi Lyu
Yining Li
Kai-xiang Chen
3DH
42
158
0
13 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
61
46
0
09 Mar 2023
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building
  [Technical Report]
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building [Technical Report]
Maureen Daum
Enhao Zhang
Dong He
Stephen Mussmann
Brandon Haynes
Ranjay Krishna
Magdalena Balazinska
27
4
0
07 Mar 2023
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group
  Activity Recognition
SPARTAN: Self-supervised Spatiotemporal Transformers Approach to Group Activity Recognition
N. V. R. Chappa
Pha Nguyen
Alec Nelson
Han-Seok Seo
Xin Li
P. Dobbs
Khoa Luu
ViT
42
14
0
06 Mar 2023
MITFAS: Mutual Information based Temporal Feature Alignment and Sampling
  for Aerial Video Action Recognition
MITFAS: Mutual Information based Temporal Feature Alignment and Sampling for Aerial Video Action Recognition
Ruiqi Xian
Xijun Wang
Dinesh Manocha
21
10
0
05 Mar 2023
Fine-tuning of sign language recognition models: a technical report
Fine-tuning of sign language recognition models: a technical report
Maxim Novopoltsev
L. Verkhovtsev
R. Murtazin
Dmitriy Milevich
Iuliia Zemtsova
SLR
14
15
0
15 Feb 2023
Reversible Vision Transformers
Reversible Vision Transformers
K. Mangalam
Haoqi Fan
Yanghao Li
Chaoxiong Wu
Bo Xiong
Christoph Feichtenhofer
Jitendra Malik
ViT
11
45
0
09 Feb 2023
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast
  Temporal Difference Transformer
PhysFormer++: Facial Video-based Physiological Measurement with SlowFast Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Yawen Cui
Jiehua Zhang
Philip H. S. Torr
Guoying Zhao
ViT
MedIm
29
80
0
07 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
C. L. P. Chen
Mu Li
ViT
49
144
0
06 Feb 2023
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
Jiayu Jiao
Yuyao Tang
Kun-Li Channing Lin
Yipeng Gao
Jinhua Ma
Yaowei Wang
Wei-Shi Zheng
MedIm
ViT
24
136
0
03 Feb 2023
ShadowFormer: Global Context Helps Image Shadow Removal
ShadowFormer: Global Context Helps Image Shadow Removal
Lanqing Guo
Siyu Huang
Dingshuo Liu
Hao Cheng
B. Wen
ViT
52
45
0
03 Feb 2023
CancerUniT: Towards a Single Unified Model for Effective Detection,
  Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection
  of CT Scans
CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans
Jieneng Chen
Yingda Xia
Jiawen Yao
K. Yan
Jianpeng Zhang
...
Xin Chen
Jingren Zhou
Alan Yuille
Zai-De Liu
Ling Zhang
ViT
MedIm
28
15
0
28 Jan 2023
ClimaX: A foundation model for weather and climate
ClimaX: A foundation model for weather and climate
Tung Nguyen
Johannes Brandstetter
Ashish Kapoor
Jayesh K. Gupta
Aditya Grover
AI4Cl
AI4CE
11
244
0
24 Jan 2023
Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
Kaiwen Zhang
Jialun Peng
Jingjing Fu
Dong Liu
ViT
27
8
0
24 Jan 2023
Building Scalable Video Understanding Benchmarks through Sports
Building Scalable Video Understanding Benchmarks through Sports
Aniket Agarwal
Alex Zhang
Karthik Narasimhan
Igor Gilitschenski
Vishvak Murahari
Yash Kant
19
1
0
17 Jan 2023
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
CMAE-V: Contrastive Masked Autoencoders for Video Action Recognition
Cheng Lu
Xiaojie Jin
Zhicheng Huang
Qibin Hou
Mingg-Ming Cheng
Jiashi Feng
37
8
0
15 Jan 2023
Ancilia: Scalable Intelligent Video Surveillance for the Artificial
  Intelligence of Things
Ancilia: Scalable Intelligent Video Surveillance for the Artificial Intelligence of Things
Armin Danesh Pazho
Christopher Neff
Ghazal Alinezhad Noghre
B. R. Ardabili
S. Yao
Mohammadreza Baharani
Hamed Tabkhi
11
38
0
09 Jan 2023
EgoDistill: Egocentric Head Motion Distillation for Efficient Video
  Understanding
EgoDistill: Egocentric Head Motion Distillation for Efficient Video Understanding
Shuhan Tan
Tushar Nagarajan
Kristen Grauman
18
21
0
05 Jan 2023
Audio-Visual Efficient Conformer for Robust Speech Recognition
Audio-Visual Efficient Conformer for Robust Speech Recognition
Maxime Burchi
Radu Timofte
VLM
11
33
0
04 Jan 2023
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Ego-Only: Egocentric Action Detection without Exocentric Transferring
Huiyu Wang
Mitesh Singh
Lorenzo Torresani
EgoV
72
23
0
03 Jan 2023
Look, Listen, and Attack: Backdoor Attacks Against Video Action
  Recognition
Look, Listen, and Attack: Backdoor Attacks Against Video Action Recognition
Hasan Hammoud
Shuming Liu
Mohammad Alkhrashi
Fahad Albalawi
Bernard Ghanem
AAML
32
8
0
03 Jan 2023
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
SyDa
82
726
0
02 Jan 2023
Edge Enhanced Image Style Transfer via Transformers
Edge Enhanced Image Style Transfer via Transformers
Chi Zhang
Jun Yang
Zaiyan Dai
Peng-Xia Cao
11
10
0
02 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
98
48
0
31 Dec 2022
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial
  Representation Learning
Scale-MAE: A Scale-Aware Masked Autoencoder for Multiscale Geospatial Representation Learning
Colorado Reed
Ritwik Gupta
Shufan Li
S. Brockman
Christopher Funk
Brian Clipp
Kurt Keutzer
Salvatore Candido
M. Uyttendaele
Trevor Darrell
121
169
0
30 Dec 2022
Transformers in Action Recognition: A Review on Temporal Modeling
Transformers in Action Recognition: A Review on Temporal Modeling
Elham Shabaninia
Hossein Nezamabadi-pour
Fatemeh Shafizadegan
ViT
24
8
0
29 Dec 2022
Exploring Vision Transformers as Diffusion Learners
Exploring Vision Transformers as Diffusion Learners
He Cao
Jianan Wang
Tianhe Ren
Xianbiao Qi
Yihao Chen
Yuan Yao
L. Zhang
36
10
0
28 Dec 2022
Reversible Column Networks
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
31
53
0
22 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
34
14
0
21 Dec 2022
Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion
  Behaviors in Social Deduction Games
Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games
Bolin Lai
Hongxin Zhang
Miao Liu
Aryan Pariani
Fiona Ryan
Wenqi Jia
Shirley Anugrah Hayati
James M. Rehg
Diyi Yang
13
7
0
16 Dec 2022
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
19
51
0
15 Dec 2022
Full Contextual Attention for Multi-resolution Transformers in Semantic
  Segmentation
Full Contextual Attention for Multi-resolution Transformers in Semantic Segmentation
Loic Themyr
Clément Rambour
Nicolas Thome
Toby Collins
Alexandre Hostettler
ViT
27
10
0
15 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group
  Propagation
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
X. Wang
ViT
32
21
0
13 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
39
4
0
13 Dec 2022
Egocentric Video Task Translation
Egocentric Video Task Translation
Zihui Xue
Yale Song
Kristen Grauman
Lorenzo Torresani
EgoV
26
13
0
13 Dec 2022
Video Prediction by Efficient Transformers
Video Prediction by Efficient Transformers
Xi Ye
Guillaume-Alexandre Bilodeau
ViT
36
33
0
12 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Mohit Bansal
Gedas Bertasius
VLM
27
78
0
09 Dec 2022
Towards Holistic Surgical Scene Understanding
Towards Holistic Surgical Scene Understanding
Natalia Valderrama
Paola Ruiz Puentes
Isabela Hernández
Nicolás Ayobi
Mathilde Verlyck
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
20
31
0
08 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
32
87
0
08 Dec 2022
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers
  using Synthetic Scene Data
PromptonomyViT: Multi-Task Prompt Learning Improves Video Transformers using Synthetic Scene Data
Roei Herzig
Ofir Abramovich
Elad Ben-Avraham
Assaf Arbelle
Leonid Karlinsky
Ariel Shamir
Trevor Darrell
Amir Globerson
36
16
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
37
15
0
08 Dec 2022
Multimodal Vision Transformers with Forced Attention for Behavior
  Analysis
Multimodal Vision Transformers with Forced Attention for Behavior Analysis
Tanay Agrawal
Michal Balazia
Philippe Muller
Franccois Brémond
ViT
23
9
0
07 Dec 2022
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
35
27
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
33
54
0
06 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
19
31
0
01 Dec 2022
Lightweight Structure-Aware Attention for Visual Understanding
Lightweight Structure-Aware Attention for Visual Understanding
Heeseung Kwon
F. M. Castro
M. Marín-Jiménez
N. Guil
Alahari Karteek
28
2
0
29 Nov 2022
A Light Touch Approach to Teaching Transformers Multi-view Geometry
A Light Touch Approach to Teaching Transformers Multi-view Geometry
Yash Bhalgat
Joao F. Henriques
Andrew Zisserman
ViT
16
6
0
28 Nov 2022
CMC v2: Towards More Accurate COVID-19 Detection with Discriminative
  Video Priors
CMC v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors
Junlin Hou
Jilan Xu
Nan Zhang
Yi Wang
Yuejie Zhang
X. Zhang
Rui Feng
18
2
0
26 Nov 2022
RbA: Segmenting Unknown Regions Rejected by All
RbA: Segmenting Unknown Regions Rejected by All
Nazir Nayal
Mısra Yavuz
João F. Henriques
Fatma Guney
UQCV
19
46
0
25 Nov 2022
Previous
123...8910...131415
Next