Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.00989
Cited By
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
1 June 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
Po-Yao (Bernie) Huang
Vaibhav Aggarwal
Arkabandhu Chowdhury
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles"
38 / 38 papers shown
Title
C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
Umar Marikkar
Syed Sameed Husain
Muhammad Awais
Sara Atito
5
0
0
24 May 2025
Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts
Qi Feng
LRM
7
0
0
18 May 2025
ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking
Haofeng Liu
Mingqi Gao
Xuxiao Luo
Ziyue Wang
Guanyi Qin
Jinlin Wu
Yueming Jin
50
0
0
13 May 2025
TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series
Xiaolei Qin
Di Wang
Jing Zhang
Fengxiang Wang
Xin Su
Bo Du
Liangpei Zhang
AI4TS
54
0
0
13 May 2025
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
Feng Yuan
Yifan Gao
Wenbin Wu
Keqing Wu
Xiaotong Guo
Jie Jiang
Xin Gao
Mamba
56
0
0
12 May 2025
H
3
^{\mathbf{3}}
3
DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
Xinyu Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
36
0
0
12 May 2025
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
43
0
0
07 May 2025
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
301
0
0
06 May 2025
Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models
Mishal Fatima
Steffen Jung
Margret Keuper
45
0
0
06 May 2025
FocusedAD: Character-centric Movie Audio Description
Xiaojun Ye
C. Wang
Yiren Song
Sheng Zhou
Liangcheng Li
Jiajun Bu
VGen
63
0
0
16 Apr 2025
IMPACT: A Generic Semantic Loss for Multimodal Medical Image Registration
Valentin Boussot
Cédric Hémon
Jean-Claude Nunes
Jason Downling
Simon Rouzé
Caroline Lafond
Anaïs Barateau
Jean-Louis Dillenseger
56
0
0
31 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Ziyi Wang
Leonidas Guibas
A. Kadambi
VGen
3DGS
98
0
0
26 Mar 2025
Segment Any-Quality Images with Generative Latent Space Enhancement
Guangqian Guo
Yoong Guo
Xuehui Yu
Wenbo Li
Yaoxing Wang
Shan Gao
VLM
94
0
0
16 Mar 2025
Customized SAM 2 for Referring Remote Sensing Image Segmentation
Fu Rong
Meng Lan
Qian Zhang
Lefei Zhang
54
0
0
10 Mar 2025
MemorySAM: Memorize Modalities and Semantics with Segment Anything Model 2 for Multi-modal Semantic Segmentation
Chenfei Liao
Xu Zheng
Yuanhuiyi Lyu
Haiwei Xue
Yihong Cao
Jiawen Wang
Kailun Yang
Xuming Hu
VLM
72
8
0
09 Mar 2025
Boltzmann Attention Sampling for Image Analysis with Small Objects
Theodore Zhao
Sid Kiblawi
Naoto Usuyama
Ho Hin Lee
Sam Preston
Hoifung Poon
Mu-Hsin Wei
MedIm
98
0
0
04 Mar 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
97
0
0
20 Feb 2025
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
74
0
0
10 Feb 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLM
MedIm
71
2
0
17 Jan 2025
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation
Cheng Yuan
Jian Jiang
Kunyi Yang
Lv Wu
Rui Wang
...
Yifan Zhou
Wanli Song
Haoran Wang
Qi Dou
Yutong Ban
49
1
0
03 Jan 2025
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
110
0
0
02 Dec 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
119
2
0
26 Nov 2024
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
57
0
0
04 Oct 2024
Prithvi WxC: Foundation Model for Weather and Climate
J. Schmude
Sujit Roy
Will Trojak
Johannes Jakubik
Daniel Salles Civitarese
...
Campbell Watson
M. Maskey
Tsengdar J Lee
Juan Bernabé-Moreno
Rahul Ramachandran
VLM
AI4Cl
67
10
0
20 Sep 2024
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
70
2
0
17 Sep 2024
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
Lin Zhao
Xiao Chen
Eric Z. Chen
Yikang Liu
Terrence Chen
Shanhui Sun
VLM
62
5
0
16 Aug 2024
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou
Xiaodan Du
Greg Shakhnarovich
Karen Livescu
SLR
54
3
0
11 Jun 2024
Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer
Tofik Ali
Partha Pratim Roy
ObjD
45
2
0
18 Jan 2024
CHAMMI: A benchmark for channel-adaptive models in microscopy imaging
Zitong S. Chen
Chau Pham
Siqi Wang
Michael Doron
Nikita Moshkov
Bryan A. Plummer
Juan C. Caicedo
45
12
0
30 Oct 2023
RT-GAN: Recurrent Temporal GAN for Adding Lightweight Temporal Consistency to Frame-Based Domain Translation Approaches
Shawn Mathew
Saad Nadeem
Alvin C. Goh
Arie Kaufman
MedIm
54
0
0
02 Oct 2023
TurboViT: Generating Fast Vision Transformers via Generative Architecture Search
Alexander Wong
Saad Abbasi
Saeejith Nair
ViT
38
1
0
22 Aug 2023
On the Benefits of 3D Pose and Tracking for Human Action Recognition
Jathushan Rajasegaran
Georgios Pavlakos
Angjoo Kanazawa
Christoph Feichtenhofer
Jitendra Malik
55
33
0
03 Apr 2023
Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality
Xiang Li
Wenhai Wang
Lingfeng Yang
Jian Yang
119
73
0
20 May 2022
Self-Supervised and Interpretable Anomaly Detection using Network Transformers
Daniel L. Marino
Chathurika S. Wickramasinghe
C. Rieger
Milos Manic
34
8
0
25 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
327
7,544
0
11 Nov 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
323
3,648
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
283
2,003
0
09 Feb 2021
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation
Golnaz Ghiasi
Huayu Chen
A. Srinivas
Rui Qian
Nayeon Lee
E. D. Cubuk
Quoc V. Le
Barret Zoph
ISeg
256
972
0
13 Dec 2020
1