Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.11227
Cited By
Multiscale Vision Transformers
22 April 2021
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multiscale Vision Transformers"
50 / 736 papers shown
Title
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun V. Reddy
William Paul
Corban Rivera
Ketul Shah
Celso M. de Melo
Rama Chellappa
37
4
0
05 Dec 2023
SRTransGAN: Image Super-Resolution using Transformer based Generative Adversarial Network
Neeraj Baghel
S. Dubey
Satish Kumar Singh
ViT
22
2
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
33
0
0
04 Dec 2023
Adapting Short-Term Transformers for Action Detection in Untrimmed Videos
Min Yang
Huan Gao
Ping Guo
Limin Wang
ViT
28
5
0
04 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
18
9
0
03 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
23
29
0
02 Dec 2023
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong
Bala Varadarajan
Lemeng Wu
Xiaoyu Xiang
Fanyi Xiao
...
Dilin Wang
Fei Sun
Forrest N. Iandola
Raghuraman Krishnamoorthi
Vikas Chandra
VLM
40
139
0
01 Dec 2023
A Generalizable Deep Learning System for Cardiac MRI
R. Shad
C. Zakka
Dhamanpreet Kaur
R. Fong
R. Filice
...
Victor Ferrari
Euan A. Ashley
Michael A. Acker
Curt P. Langlotz
W. Hiesinger
MedIm
46
1
0
01 Dec 2023
Just Add
π
π
π
! Pose Induced Video Transformers for Understanding Activities of Daily Living
Dominick Reilly
Srijan Das
ViT
25
17
0
30 Nov 2023
CAST: Cross-Attention in Space and Time for Video Action Recognition
Dongho Lee
Jongseo Lee
Jinwoo Choi
EgoV
35
12
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
23
2
0
30 Nov 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
15
7
0
30 Nov 2023
A Simple Video Segmenter by Tracking Objects Along Axial Trajectories
Ju He
Qihang Yu
Inkyu Shin
XueQing Deng
Alan L. Yuille
Xiaohui Shen
Liang-Chieh Chen
VOS
30
2
0
30 Nov 2023
PViT-6D: Overclocking Vision Transformers for 6D Pose Estimation with Confidence-Level Prediction and Pose Tokens
Sebastian Stapf
Tobias Bauernfeind
Marco Riboldi
ViT
22
1
0
29 Nov 2023
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung
Shu-Wei Lu
Yi-Hsuan Tsai
Yi-Ting Chen
35
6
0
29 Nov 2023
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
19
10
0
29 Nov 2023
Object-based (yet Class-agnostic) Video Domain Adaptation
Dantong Niu
Amir Bar
Roei Herzig
Trevor Darrell
Anna Rohrbach
22
1
0
29 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Bernard Ghanem
33
25
0
28 Nov 2023
Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition
Jiaming Zhou
Hanjun Li
Kun-Yu Lin
Junwei Liang
23
1
0
28 Nov 2023
REACT: Recognize Every Action Everywhere All At Once
N. V. R. Chappa
Pha Nguyen
P. Dobbs
Khoa Luu
36
6
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
25
6
0
27 Nov 2023
Advancing Vision Transformers with Group-Mix Attention
Chongjian Ge
Xiaohan Ding
Zhan Tong
Li Yuan
Jiangliu Wang
Yibing Song
Ping Luo
112
16
0
26 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
29
3
0
25 Nov 2023
MDFL: Multi-domain Diffusion-driven Feature Learning
Daixun Li
Weiying Xie
Jiaqing Zhang
Yunsong Li
DiffM
37
9
0
16 Nov 2023
FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients
Daixun Li
Weiying Xie
Zixuan Wang
YiBing Lu
Yunsong Li
Leyuan Fang
FedML
DiffM
41
18
0
16 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
43
10
0
09 Nov 2023
A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose Estimation
Qi-jun Zhao
Ce Zheng
Mengyuan Liu
C. L. P. Chen
33
14
0
06 Nov 2023
Asymmetric Masked Distillation for Pre-Training Small Foundation Models
Zhiyu Zhao
Bingkun Huang
Sen Xing
Gangshan Wu
Yu Qiao
Limin Wang
34
5
0
06 Nov 2023
CCMR: High Resolution Optical Flow Estimation via Coarse-to-Fine Context-Guided Motion Reasoning
Azin Jahedi
Maximilian Luz
Marc Rivinius
Andrés Bruhn
22
2
0
05 Nov 2023
P-Age: Pexels Dataset for Robust Spatio-Temporal Apparent Age Classification
Abid Ali
Ashish Marisetty
François Brémond
27
6
0
04 Nov 2023
Beyond still images: Temporal features and input variance resilience
AmirHosein Fadaei
M. Dehaqani
35
0
0
01 Nov 2023
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab
Jieming Cui
Ziren Gong
Baoxiong Jia
Siyuan Huang
Zilong Zheng
Jianzhu Ma
Yixin Zhu
31
3
0
01 Nov 2023
Object-centric Video Representation for Long-term Action Anticipation
Ce Zhang
Changcheng Fu
Shijie Wang
Nakul Agarwal
Kwonjoon Lee
Chiho Choi
Chen Sun
15
14
0
31 Oct 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
32
16
0
31 Oct 2023
Self-Supervised Pre-Training for Precipitation Post-Processor
Sojung An
Junha Lee
Jiyeon Jang
Inchae Na
Wooyeon Park
Sujeong You
AI4Cl
21
1
0
31 Oct 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
28
63
0
30 Oct 2023
Triplet Attention Transformer for Spatiotemporal Predictive Learning
Xuesong Nie
Xi Chen
Haoyuan Jin
Zhihang Zhu
Yunfeng Yan
Donglian Qi
ViT
14
10
0
28 Oct 2023
MCUFormer: Deploying Vision Transformers on Microcontrollers with Limited Memory
Yinan Liang
Ziwei Wang
Xiuwei Xu
Yansong Tang
Jie Zhou
Jiwen Lu
21
9
0
25 Oct 2023
Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images
Bissmella Bahaduri
Zuheng Ming
Fangchen Feng
Anissa Mokraou
21
1
0
21 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps
Sidi Wu
Yizi Chen
Konrad Schindler
L. Hurni
19
2
0
19 Oct 2023
USDC: Unified Static and Dynamic Compression for Visual Transformer
Huan Yuan
Chao Liao
Jianchao Tan
Peng Yao
Jiyuan Jia
Bin Chen
Chengru Song
Di Zhang
ViT
20
0
0
17 Oct 2023
RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models
Zijun Long
George Killick
R. McCreadie
Gerardo Aragon Camarasa
VLM
27
11
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
Selectivity Drives Productivity: Efficient Dataset Pruning for Enhanced Transfer Learning
Yihua Zhang
Yimeng Zhang
Aochuan Chen
Jinghan Jia
Jiancheng Liu
Gaowen Liu
Min-Fong Hong
Shiyu Chang
Sijia Liu
AAML
29
8
0
13 Oct 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
31
4
0
10 Oct 2023
Data efficient deep learning for medical image analysis: A survey
Suruchi Kumari
Pravendra Singh
37
12
0
10 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation
Yu-Huan Wu
Shi-Chen Zhang
Yun-Hai Liu
Le Zhang
Xin Zhan
Daquan Zhou
Jiashi Feng
Ming-Ming Cheng
Liangli Zhen
ViT
40
3
0
08 Oct 2023
TiC: Exploring Vision Transformer in Convolution
Song Zhang
Qingzhong Wang
Jiang Bian
Haoyi Xiong
ViT
29
1
0
06 Oct 2023
ZeroI2V: Zero-Cost Adaptation of Pre-trained Transformers from Image to Video
Xinhao Li
Yuhan Zhu
Limin Wang
VLM
32
8
0
02 Oct 2023
Win-Win: Training High-Resolution Vision Transformers from Two Windows
Vincent Leroy
Jérôme Revaud
Thomas Lucas
Philippe Weinzaepfel
ViT
34
2
0
01 Oct 2023
Previous
1
2
3
4
5
6
...
13
14
15
Next