ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

50 / 736 papers shown
Title
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
33
24
0
02 Apr 2024
Bidirectional Multi-Scale Implicit Neural Representations for Image
  Deraining
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen
Jinshan Pan
Jiangxin Dong
AI4CE
31
22
0
02 Apr 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques
  and Insights
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
A. Kazerouni
I. Hacihaliloglu
Dorit Merhof
43
7
0
28 Mar 2024
OmniVid: A Generative Framework for Universal Video Understanding
OmniVid: A Generative Framework for Universal Video Understanding
Junke Wang
Dongdong Chen
Chong Luo
Bo He
Lu Yuan
Zuxuan Wu
Yu-Gang Jiang
VLM
VGen
69
14
0
26 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
36
86
0
26 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
36
1
0
24 Mar 2024
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
38
2
0
24 Mar 2024
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang
Guo Chen
Jilan Xu
Mingfang Zhang
Lijin Yang
...
Hongjie Zhang
Lu Dong
Yali Wang
Limin Wang
Yu Qiao
EgoV
60
36
0
24 Mar 2024
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT
  Descriptors
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri
Matthew Walmer
Kamal Gupta
Abhinav Shrivastava
35
4
0
21 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
54
40
0
19 Mar 2024
Selective, Interpretable, and Motion Consistent Privacy Attribute
  Obfuscation for Action Recognition
Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition
Filip Ilic
Henghui Zhao
T. Pock
Richard P. Wildes
PICV
AAML
31
2
0
19 Mar 2024
Video Mamba Suite: State Space Model as a Versatile Alternative for
  Video Understanding
Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding
Guo Chen
Yifei Huang
Jilan Xu
Baoqi Pei
Zhe Chen
Zhiqi Li
Jiahao Wang
Kunchang Li
Tong Lu
Limin Wang
Mamba
64
73
0
14 Mar 2024
Don't Judge by the Look: Towards Motion Coherent Video Representation
Don't Judge by the Look: Towards Motion Coherent Video Representation
Yitian Zhang
Yue Bai
Huan Wang
Yizhou Wang
Yun Fu
33
0
0
14 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving
  Representation Learning
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
46
2
0
13 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
37
180
0
11 Mar 2024
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for
  Distracted Driver Action Recognition
Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition
Erkut Akdag
Zeqi Zhu
Egor Bondarev
Peter H. N. de With
ViT
29
5
0
11 Mar 2024
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object
  Interaction in the Multi-View World
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-View World
Boshen Xu
Sipeng Zheng
Qin Jin
44
7
0
09 Mar 2024
Rethinking Transformers Pre-training for Multi-Spectral Satellite
  Imagery
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman
Muzammal Naseer
Hisham Cholakkal
Rao Muhammad Anwar
Salman Khan
Fahad Shahbaz Khan
ViT
36
35
0
08 Mar 2024
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like
  Speed
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang
Xingyi He He
Sida Peng
Dongli Tan
Xiaowei Zhou
3DV
28
41
0
07 Mar 2024
Modeling Multimodal Social Interactions: New Challenges and Baselines
  with Densely Aligned Representations
Modeling Multimodal Social Interactions: New Challenges and Baselines with Densely Aligned Representations
Sangmin Lee
Bolin Lai
Fiona Ryan
Bikram Boote
James M. Rehg
28
8
0
04 Mar 2024
Label-efficient multi-organ segmentation with a diffusion model
Label-efficient multi-organ segmentation with a diffusion model
Yongzhi Huang
Jinxin Zhu
Haseeb Hassan
Liyilei Su
Jingyu Li
Binding Huang
Yun Peng
Jingyu Li
Jun Ma
Bingding Huang
DiffM
MedIm
31
0
0
23 Feb 2024
LLMs Meet Long Video: Advancing Long Video Comprehension with An
  Interactive Visual Adapter in LLMs
LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs
Yunxin Li
Xinyu Chen
Baotain Hu
Min-Ling Zhang
40
9
0
21 Feb 2024
Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal
  Learning for Glaucoma Forecasting from Irregular Time Series Images
Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images
Xikai Yang
Jian Wu
Xi Wang
Yuchen Yuan
N. Wang
Pheng-Ann Heng
AI4TS
MedIm
25
0
0
21 Feb 2024
FViT: A Focal Vision Transformer with Gabor Filter
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
52
4
0
17 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on
  Unlabeled Public Videos
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
21
1
0
14 Feb 2024
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in
  Panchromatic Satellite Images
SISP: A Benchmark Dataset for Fine-grained Ship Instance Segmentation in Panchromatic Satellite Images
Pengming Feng
Mingjie Xie
Hongning Liu
Xuanjia Zhao
Guangjun He
Xueliang Zhang
Jian Guan
21
1
0
06 Feb 2024
Computer Vision for Primate Behavior Analysis in the Wild
Computer Vision for Primate Behavior Analysis in the Wild
Richard Vogg
Timo Lüddecke
Jonathan Henrich
Sharmita Dey
Matthias Nuske
...
Alexander Gail
Stefan Treue
H. Scherberger
F. Worgotter
Alexander S. Ecker
28
3
0
29 Jan 2024
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang
Kaihan Li
Di You
Yichong Chen
Arvin Lin
Siying Liu
Xiaohui Li
Julie A. McCann
24
6
0
24 Jan 2024
SGTR+: End-to-end Scene Graph Generation with Transformer
SGTR+: End-to-end Scene Graph Generation with Transformer
Rongjie Li
Songyang Zhang
Xuming He
ViT
29
2
0
23 Jan 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot
  Action Recognition
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition
Jiaming Zhou
Junwei Liang
Kun-Yu Lin
Jinrui Yang
Wei-Shi Zheng
VLM
19
8
0
22 Jan 2024
Adversarial Augmentation Training Makes Action Recognition Models More
  Robust to Realistic Video Distribution Shifts
Adversarial Augmentation Training Makes Action Recognition Models More Robust to Realistic Video Distribution Shifts
Kiyoon Kim
Shreyank N. Gowda
Panagiotis Eustratiadis
Antreas Antoniou
Robert B Fisher
39
2
0
21 Jan 2024
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
Nicolás Ayobi
Santiago Rodríguez
Alejandra Pérez
Isabela Hernández
Nicolás Aparicio
...
Sebastián Pena
J. Santander
J. Caicedo
Nicolás Fernández
Pablo Arbelaez
ViT
MedIm
29
9
0
20 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
19
5
0
18 Jan 2024
Multitask Learning in Minimally Invasive Surgical Vision: A Review
Multitask Learning in Minimally Invasive Surgical Vision: A Review
Oluwatosin O. Alabi
Tom Kamiel Magda Vercauteren
Miaojing Shi
20
1
0
16 Jan 2024
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Transformer-CNN Fused Architecture for Enhanced Skin Lesion Segmentation
Siddharth Tiwari
MedIm
ViT
36
0
0
10 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
35
0
0
10 Jan 2024
MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation
MST: Adaptive Multi-Scale Tokens Guided Interactive Segmentation
Long Xu
Shanghong Li
Yongquan Chen
Jun Luo
Shiwu Lai
21
0
0
09 Jan 2024
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video
  Classification
Efficient Multiscale Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
25
5
0
08 Jan 2024
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for
  Audio-Video Classification
Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification
Wentao Zhu
27
4
0
08 Jan 2024
SAR-RARP50: Segmentation of surgical instrumentation and Action
  Recognition on Robot-Assisted Radical Prostatectomy Challenge
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge
Dimitrios Psychogyios
Emanuele Colleoni
Beatrice van Amsterdam
Chih-Yang Li
Shu-Yu Huang
...
Santiago Rodriguez
Juanita Puentes
Pablo Arbelaez
Omid Mohareri
Danail Stoyanov
40
24
0
31 Dec 2023
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Licai Sun
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
56
14
0
31 Dec 2023
Multiscale Vision Transformers meet Bipartite Matching for efficient
  single-stage Action Localization
Multiscale Vision Transformers meet Bipartite Matching for efficient single-stage Action Localization
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
47
4
0
29 Dec 2023
Video Understanding with Large Language Models: A Survey
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
52
82
0
29 Dec 2023
ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image
  Identification
ROI-Aware Multiscale Cross-Attention Vision Transformer for Pest Image Identification
Ga-Eun Kim
Chang-Hwan Son
21
1
0
28 Dec 2023
Video Recognition in Portrait Mode
Video Recognition in Portrait Mode
Mingfei Han
Linjie Yang
Xiaojie Jin
Jiashi Feng
Xiaojun Chang
Heng Wang
30
3
0
21 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
36
5
0
21 Dec 2023
Early Action Recognition with Action Prototypes
Early Action Recognition with Action Prototypes
G. Camporese
Alessandro Bergamo
Xunyu Lin
Joseph Tighe
Davide Modolo
EgoV
16
0
0
11 Dec 2023
TPRNN: A Top-Down Pyramidal Recurrent Neural Network for Time Series
  Forecasting
TPRNN: A Top-Down Pyramidal Recurrent Neural Network for Time Series Forecasting
Ling Chen
Jiahua Cui
AI4TS
22
1
0
11 Dec 2023
MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation
MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation
Abdullah Rashwan
Jiageng Zhang
A. Taalimi
Fan Yang
Xingyi Zhou
Chaochao Yan
Liang-Chieh Chen
Yeqing Li
ViT
28
5
0
11 Dec 2023
MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR
MGTR: Multi-Granular Transformer for Motion Prediction with LiDAR
Yi Gan
Hao Xiao
Yizhe Zhao
Ethan Zhang
Zhe Huang
Xin Ye
Lingting Ge
28
15
0
05 Dec 2023
Previous
12345...131415
Next