ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.11227
  4. Cited By
Multiscale Vision Transformers

Multiscale Vision Transformers

22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
    ViT
ArXivPDFHTML

Papers citing "Multiscale Vision Transformers"

50 / 736 papers shown
Title
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for
  Active Learning on Label Noise
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise
Moseli Motsóehli
Kyungim Baek
31
1
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
26
9
0
07 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
39
0
0
04 Nov 2024
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving
Salman Khan
Izzeddin Teeti
Reza Javanmard Alitappeh
Mihaela C. Stoian
Eleonora Giunchiglia
Gurkirt Singh
Andrew Bradley
Fabio Cuzzolin
40
0
0
03 Nov 2024
Video Token Merging for Long-form Video Understanding
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
40
5
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
39
0
0
30 Oct 2024
HRPVT: High-Resolution Pyramid Vision Transformer for medium and
  small-scale human pose estimation
HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
Zhoujie Xu
ViT
3DH
36
2
0
29 Oct 2024
Enhancing Action Recognition by Leveraging the Hierarchical Structure of
  Actions and Textual Context
Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
Manuel Benavent-Lledo
David Mulero-Pérez
David Ortiz-Perez
José García Rodríguez
Antonis Argyros
24
0
0
28 Oct 2024
Prompting Continual Person Search
Prompting Continual Person Search
Pengcheng Zhang
Xiaohan Yu
Xiao Bai
Jin Zheng
X. Ning
CLL
VLM
36
1
0
25 Oct 2024
Detecting Adversarial Examples
Detecting Adversarial Examples
Furkan Mumcu
Yasin Yilmaz
AAML
18
1
0
22 Oct 2024
Masked Differential Privacy
Masked Differential Privacy
David Schneider
Sina Sajadmanesh
Vikash Sehwag
Saquib Sarfraz
Rainer Stiefelhagen
Lingjuan Lyu
Vivek Sharma
28
0
0
22 Oct 2024
VidCompress: Memory-Enhanced Temporal Compression for Video
  Understanding in Large Language Models
VidCompress: Memory-Enhanced Temporal Compression for Video Understanding in Large Language Models
Xiaohan Lan
Yitian Yuan
Zequn Jie
Lin Ma
VLM
21
2
0
15 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
32
1
0
14 Oct 2024
Hybrid Transformer for Early Alzheimer's Detection: Integration of
  Handwriting-Based 2D Images and 1D Signal Features
Hybrid Transformer for Early Alzheimer's Detection: Integration of Handwriting-Based 2D Images and 1D Signal Features
Changqing Gong
Huafeng Qin
M. El-Yacoubi
24
0
0
14 Oct 2024
EchoPrime: A Multi-Video View-Informed Vision-Language Model for
  Comprehensive Echocardiography Interpretation
EchoPrime: A Multi-Video View-Informed Vision-Language Model for Comprehensive Echocardiography Interpretation
Milos Vukadinovic
Xiu Tang
N. Yuan
Paul Cheng
Debiao Li
Susan Cheng
B. He
David Ouyang
32
11
0
13 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
40
0
0
04 Oct 2024
MMFNet: Multi-Scale Frequency Masking Neural Network for Multivariate
  Time Series Forecasting
MMFNet: Multi-Scale Frequency Masking Neural Network for Multivariate Time Series Forecasting
Aitian Ma
Dongsheng Luo
Mo Sha
AI4TS
23
0
0
02 Oct 2024
Tracking objects that change in appearance with phase synchrony
Tracking objects that change in appearance with phase synchrony
Sabine Muzellec
Drew Linsley
A. Ashok
E. Mingolla
Girik Malik
Rufin VanRullen
Thomas Serre
31
1
0
02 Oct 2024
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Advancing Medical Radiograph Representation Learning: A Hybrid Pre-training Paradigm with Multilevel Semantic Granularity
Hanqi Jiang
Xixuan Hao
Yuzhou Huang
Chong Ma
Jiaxun Zhang
Yi Pan
Ruimao Zhang
MedIm
37
0
0
01 Oct 2024
Loose Social-Interaction Recognition in Real-world Therapy Scenarios
Loose Social-Interaction Recognition in Real-world Therapy Scenarios
Abid Ali
Rui Dai
Ashish Marisetty
Guillaume Astruc
Monique Thonnat
J. Odobez
Susanne Thümmler
Francois Bremond
34
1
0
30 Sep 2024
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid
  Robust and Fair Featuring in Face Analysis
Self-supervised Auxiliary Learning for Texture and Model-based Hybrid Robust and Fair Featuring in Face Analysis
Shukesh Reddy
Nishit Poddar
Srijan Das
Abhijit Das
CVBM
30
0
0
29 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
34
0
0
26 Sep 2024
Reference Dataset and Benchmark for Reconstructing Laser Parameters from
  On-axis Video in Powder Bed Fusion of Bulk Stainless Steel
Reference Dataset and Benchmark for Reconstructing Laser Parameters from On-axis Video in Powder Bed Fusion of Bulk Stainless Steel
Cyril Blanc
Ayyoub Ahar
Kurt De Grave
22
0
0
19 Sep 2024
SimMAT: Exploring Transferability from Vision Foundation Models to Any
  Image Modality
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
Chenyang Lei
Liyi Chen
Jun Cen
Xiao Chen
Zhen Lei
Felix Heide
Ziwei Liu
Qifeng Chen
Zhaoxiang Zhang
47
0
0
12 Sep 2024
Token Turing Machines are Efficient Vision Models
Token Turing Machines are Efficient Vision Models
Purvish Jajal
Nick Eliopoulos
Benjamin Shiue-Hal Chou
George K. Thiravathukal
James C. Davis
Yung-Hsiang Lu
90
0
0
11 Sep 2024
MVTN: A Multiscale Video Transformer Network for Hand Gesture
  Recognition
MVTN: A Multiscale Video Transformer Network for Hand Gesture Recognition
Mallika Garg
Debashis Ghosh
P. M. Pradhan
ViT
28
1
0
05 Sep 2024
LowFormer: Hardware Efficient Design for Convolutional Transformer
  Backbones
LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones
Moritz Nottebaum
Matteo Dunnhofer
C. Micheloni
ViT
33
1
0
05 Sep 2024
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for
  Efficient Action Recognition
ReSpike: Residual Frames-based Hybrid Spiking Neural Networks for Efficient Action Recognition
Shiting Xiao
Yuhang Li
Youngeun Kim
Donghyun Lee
Priyadarshini Panda
36
1
0
03 Sep 2024
3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model
  for Laryngeal Cancer Detection Using Laryngoscopic Videos
3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos
Meiyu Qiu
Y. Li
Wenjun Huang
Haoyun Zhang
Weiping Zheng
Wenbin Lei
Xiaomao Fan
26
0
0
02 Sep 2024
Real-time Accident Anticipation for Autonomous Driving Through Monocular
  Depth-Enhanced 3D Modeling
Real-time Accident Anticipation for Autonomous Driving Through Monocular Depth-Enhanced 3D Modeling
Haicheng Liao
Yongkang Li
Chengyue Wang
Songning Lai
Zhenning Li
Zilin Bian
Jaeyoung Lee
Zhiyong Cui
Guohui Zhang
Chengzhong Xu
36
8
0
02 Sep 2024
PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic
  Pituitary Surgery
PitVis-2023 Challenge: Workflow Recognition in videos of Endoscopic Pituitary Surgery
Adrito Das
Danyal Z. Khan
Dimitrios Psychogyios
Yitong Zhang
John G. Hanrahan
...
Santiago Rodriguez
Pablo Arbelaez
Danail Stoyanov
Hani J. Marcus
Sophia Bano
36
5
0
02 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
29
0
0
02 Sep 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer
  Learning
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
29
1
0
20 Aug 2024
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Dynamic and Compressive Adaptation of Transformers From Images to Videos
Guozhen Zhang
Jingyu Liu
Shengming Cao
Xiaotong Zhao
Kevin Zhao
Kai Ma
Limin Wang
ViT
29
1
0
13 Aug 2024
MacFormer: Semantic Segmentation with Fine Object Boundaries
MacFormer: Semantic Segmentation with Fine Object Boundaries
Guoan Xu
Wenfeng Huang
Tao Wu
Ligeng Chen
Wenjing Jia
Guangwei Gao
Xiatian Zhu
Stuart W. Perry
40
0
0
11 Aug 2024
Personalizing Federated Instrument Segmentation with Visual Trait Priors
  in Robotic Surgery
Personalizing Federated Instrument Segmentation with Visual Trait Priors in Robotic Surgery
Jialang Xu
Jiacheng Wang
Lequan Yu
Danail Stoyanov
Yueming Jin
E. Mazomenos
26
1
0
06 Aug 2024
MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
MPT-PAR:Mix-Parameters Transformer for Panoramic Activity Recognition
Wenqing Gan
Yaoyu Li
Jian Li
Zhangang Lin
ViT
30
0
0
01 Aug 2024
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved
  Denoising Training
DNTextSpotter: Arbitrary-Shaped Scene Text Spotting via Improved Denoising Training
Xi Chen
Qian Qiao
Jun Gao
Tianxiang Wu
Rahul Bhadani
...
Ziqiang Cao
Larry Head
Yue Zhang
Jielei Zhang
Huyang Sun
DiffM
28
5
0
01 Aug 2024
PEAR: Phrase-Based Hand-Object Interaction Anticipation
PEAR: Phrase-Based Hand-Object Interaction Anticipation
Zichen Zhang
Hongcheng Luo
Wei Zhai
N. A. Ushakov
Yu Kang
40
5
0
31 Jul 2024
SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial
  Expression Spotting
SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting
Yicheng Deng
Hideaki Hayashi
Hajime Nagahara
32
1
0
30 Jul 2024
Octave-YOLO: Cross frequency detection network with octave convolution
Octave-YOLO: Cross frequency detection network with octave convolution
Sangjune Shin
Dongkun Shin
ObjD
33
0
0
29 Jul 2024
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Sparse Refinement for Efficient High-Resolution Semantic Segmentation
Zhijian Liu
Zhuoyang Zhang
Samir Khaki
Shang Yang
Haotian Tang
Chenfeng Xu
Kurt Keutzer
Song Han
SSeg
51
1
0
26 Jul 2024
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey
  Interactions in Animal Videos
MARINE: A Computer Vision Model for Detecting Rare Predator-Prey Interactions in Animal Videos
Zsófia Katona
Seyed Sahand Mohamadi Ziabari
F. Karimi Nejadasl
24
0
0
25 Jul 2024
CRASH: Crash Recognition and Anticipation System Harnessing with
  Context-Aware and Temporal Focus Attentions
CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions
Haicheng Liao
Haoyu Sun
Huanming Shen
Chengyue Wang
Kahou Tam
Chunlin Tian
Li Li
Chengzhong Xu
Zhenning Li
26
5
0
25 Jul 2024
MuST: Multi-Scale Transformers for Surgical Phase Recognition
MuST: Multi-Scale Transformers for Surgical Phase Recognition
Alejandra Pérez
Santiago Rodríguez
Nicolás Ayobi
Nicolás Aparicio
Eugénie Dessevres
Pablo Arbelaez
MedIm
26
1
0
24 Jul 2024
Embedding-Free Transformer with Inference Spatial Reduction for
  Efficient Semantic Segmentation
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu
Yubin Cho
Beoungwoo Kang
Seunghun Moon
Kyeongbo Kong
Suk-Ju Kang
30
3
0
24 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
34
1
0
22 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
49
3
0
22 Jul 2024
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Towards Robust Vision Transformer via Masked Adaptive Ensemble
Fudong Lin
Jiadong Lou
Xu Yuan
Nianfeng Tzeng
ViT
AAML
28
1
0
22 Jul 2024
DuoFormer: Leveraging Hierarchical Visual Representations by Local and
  Global Attention
DuoFormer: Leveraging Hierarchical Visual Representations by Local and Global Attention
Xiaoya Tang
Bodong Zhang
Beatrice S. Knudsen
Tolga Tasdizen
ViT
MedIm
45
1
0
18 Jul 2024
Previous
12345...131415
Next