Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2306.00989
Cited By
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
International Conference on Machine Learning (ICML), 2023
1 June 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
Po-Yao (Bernie) Huang
Vaibhav Aggarwal
Arkabandhu Chowdhury
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (985★)
Papers citing
"Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles"
50 / 171 papers shown
Title
Systematic Evaluation and Guidelines for Segment Anything Model in Surgical Video Analysis
Cheng Yuan
Jian Jiang
Kunyi Yang
Lv Wu
Rui Wang
...
Wanli Song
Jian Shu
Yueming Jin
Qi Dou
Yutong Ban
256
2
0
31 Dec 2024
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
492
11
0
16 Dec 2024
One-Shot Multilingual Font Generation Via ViT
Zhiheng Wang
Jiarui Liu
VLM
234
0
0
15 Dec 2024
Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts
Chenyang Zhu
Bin Xiao
Lin Shi
Shoukun Xu
Xu Zheng
MoE
300
20
0
05 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
361
4
0
02 Dec 2024
SAMa: Material-aware 3D Selection and Segmentation
Michael Fischer
Iliyan Georgiev
Thibault Groueix
Vladimir G. Kim
Tobias Ritschel
Valentin Deschaintre
3DV
276
4
0
28 Nov 2024
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Computer Vision and Pattern Recognition (CVPR), 2024
Jovana Videnovic
A. Lukežič
Matej Kristan
VLM
300
32
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Computer Vision and Pattern Recognition (CVPR), 2024
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
369
22
0
26 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
225
3
0
22 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLM
ObjD
232
7
0
17 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
538
2
0
15 Nov 2024
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim
Hyungjin Chung
Byung-Hoon Kim
VLM
374
1
0
11 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Neural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
205
24
0
07 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
195
0
0
04 Nov 2024
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim
Chanyong Shin
Joonhyun Jeong
Hyungsik Jung
Se Yun Lee
Sewhan Chun
Dong-Hyun Hwang
Joonsang Yu
VLM
230
7
0
01 Nov 2024
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery
Kangning Cui
Wei Tang
Rongkun Zhu
Manqi Wang
Gregory Larsen
...
Jordan Karubian
Raymond H. Chan
R. Plemmons
Jean-Michel Morel
Miles Silman
128
7
0
14 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
International Journal of Computer Vision (IJCV), 2024
Xiaorui Sun
Jing Liu
Jikang Cheng
Xiaofeng Zhu
Ping Hu
VLM
393
16
0
07 Oct 2024
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLM
LRM
139
2
0
04 Oct 2024
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track)
Mingxu Feng
Dian Chao
Peng Zheng
Yang Yang
120
0
0
30 Sep 2024
1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
Minqiang Zou
Zhi Lv
Riqiang Jin
Tian Zhan
Mochen Yu
Yao Tang
Jiajun Liang
EgoV
225
0
0
28 Sep 2024
Prithvi WxC: Foundation Model for Weather and Climate
J. Schmude
Sujit Roy
Will Trojak
Johannes Jakubik
Daniel Salles Civitarese
...
Campbell Watson
M. Maskey
Tsengdar J Lee
Juan Bernabé-Moreno
Rahul Ramachandran
VLM
AI4Cl
258
20
0
20 Sep 2024
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
328
7
0
17 Sep 2024
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
Arya Farkhondeh
Samy Tafasca
J. Odobez
158
0
0
14 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
182
13
0
11 Sep 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Neural Information Processing Systems (NeurIPS), 2024
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGen
DiffM
264
13
0
27 Aug 2024
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation
Xinyu Xiong
Zihuang Wu
Shuangyi Tan
Wenxue Li
Feilong Tang
Ying Chen
Siying Li
Jie Ma
Guanbin Li
VLM
172
74
0
16 Aug 2024
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Lin Zhao
Xiao Chen
Eric Z. Chen
Yikang Liu
Terrence Chen
Shanhui Sun
VLM
232
18
0
16 Aug 2024
Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2
A. Yu
Mohsen Hariri
Xuecen Zhang
Mingrui Yang
Vipin Chaudhary
Xiaojuan Li
VGen
MedIm
86
6
0
08 Aug 2024
Path-SAM2: Transfer SAM2 for digital pathology semantic segmentation
Mingya Zhang
Liang Wang
Zhihao Chen
Yiyuan Ge
Xianping Tao
VLM
MedIm
162
6
0
07 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
215
1
0
07 Aug 2024
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Jun Ma
Sumin Kim
Feifei Li
Mohammed Baharoon
Reza Asakereh
Hongwei Lyu
Bo Wang
VLM
MedIm
254
58
0
06 Aug 2024
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition
Duc Manh Nguyen Dang
Viet-Hang Duong
Jia Ching Wang
Nhan Bui Duc
117
6
0
05 Aug 2024
SAM 2: Segment Anything in Images and Videos
International Conference on Learning Representations (ICLR), 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
...
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLM
MLLM
418
1,967
0
01 Aug 2024
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning
Xin Zuo
Yu Sheng
Jifeng Shen
Yongwei Shan
138
0
0
01 Aug 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRL
MedIm
129
7
0
04 Jul 2024
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou
Xiaodan Du
G. Shakhnarovich
Karen Livescu
SLR
256
9
0
11 Jun 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Chau Pham
Bryan A. Plummer
193
7
0
26 May 2024
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
144
2
0
09 May 2024
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition
ACM Multimedia (MM), 2024
Meiqi Cao
Rui Yan
Xiangbo Shu
Guangzhao Dai
Yazhou Yao
Guo-Sen Xie
193
1
0
04 May 2024
Training a high-performance retinal foundation model with half-the-data and 400 times less compute
Justin Engelmann
Miguel O. Bernabeu
MedIm
OOD
332
5
0
30 Apr 2024
SFMViT: SlowFast Meet ViT in Chaotic World
Jiaying Lin
Jiajun Wen
Mengyuan Liu
Jinfu Liu
Baiqiao Yin
Yue Li
ViT
138
1
0
25 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
216
5
0
18 Apr 2024
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
221
71
0
01 Apr 2024
DailyMAE: Towards Pretraining Masked Autoencoders in One Day
Jiantao Wu
Shentong Mo
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
178
4
0
31 Mar 2024
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan
Takehiko Ohkawa
Linlin Yang
Nie Lin
Zhishan Zhou
...
Kun He
Yoichi Sato
Otmar Hilliges
Hyung Jin Chang
Angela Yao
208
29
0
25 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
166
5
0
24 Mar 2024
VidLA: Video-Language Alignment at Scale
Computer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLM
AI4TS
172
8
0
21 Mar 2024
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek
Horst Possegger
Dominik Narnhofer
Horst Bischof
Mateusz Koziñski
213
22
0
21 Mar 2024
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLM
LRM
281
68
0
19 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
150
7
0
04 Mar 2024
Previous
1
2
3
4
Next