ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.00989
  4. Cited By
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

International Conference on Machine Learning (ICML), 2023
1 June 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
Po-Yao (Bernie) Huang
Vaibhav Aggarwal
Arkabandhu Chowdhury
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
    3DH
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (985★)

Papers citing "Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles"

50 / 171 papers shown
Title
Systematic Evaluation and Guidelines for Segment Anything Model in Surgical Video Analysis
Systematic Evaluation and Guidelines for Segment Anything Model in Surgical Video Analysis
Cheng Yuan
Jian Jiang
Kunyi Yang
Lv Wu
Rui Wang
...
Wanli Song
Jian Shu
Yueming Jin
Qi Dou
Yutong Ban
256
2
0
31 Dec 2024
DINO-Foresight: Looking into the Future with DINO
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
AI4CE
492
11
0
16 Dec 2024
One-Shot Multilingual Font Generation Via ViT
One-Shot Multilingual Font Generation Via ViT
Zhiheng Wang
Jiarui Liu
VLM
234
0
0
15 Dec 2024
Customize Segment Anything Model for Multi-Modal Semantic Segmentation
  with Mixture of LoRA Experts
Customize Segment Anything Model for Multi-Modal Semantic Segmentation with Mixture of LoRA Experts
Chenyang Zhu
Bin Xiao
Lin Shi
Shoukun Xu
Xu Zheng
MoE
300
20
0
05 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
361
4
0
02 Dec 2024
SAMa: Material-aware 3D Selection and Segmentation
SAMa: Material-aware 3D Selection and Segmentation
Michael Fischer
Iliyan Georgiev
Thibault Groueix
Vladimir G. Kim
Tobias Ritschel
Valentin Deschaintre
3DV
276
4
0
28 Nov 2024
A Distractor-Aware Memory for Visual Object Tracking with SAM2
A Distractor-Aware Memory for Visual Object Tracking with SAM2Computer Vision and Pattern Recognition (CVPR), 2024
Jovana Videnovic
A. Lukežič
Matej Kristan
VLM
300
32
0
26 Nov 2024
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video SegmentationComputer Vision and Pattern Recognition (CVPR), 2024
Claudia Cuttano
Gabriele Trivigno
Gabriele Rosi
Carlo Masone
Giuseppe Averta
VOS
369
22
0
26 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual
  Understanding Tasks
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
225
3
0
22 Nov 2024
Exploiting VLM Localizability and Semantics for Open Vocabulary Action DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Wentao Bao
Keqin Li
Yuxiao Chen
Deep Patel
Martin Renqiang Min
Yu Kong
VLMObjD
232
7
0
17 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
538
2
0
15 Nov 2024
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim
Hyungjin Chung
Byung-Hoon Kim
VLM
374
1
0
11 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Don't Look Twice: Faster Video Transformers with Run-Length TokenizationNeural Information Processing Systems (NeurIPS), 2024
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
205
24
0
07 Nov 2024
AM Flow: Adapters for Temporal Processing in Action Recognition
AM Flow: Adapters for Temporal Processing in Action Recognition
Tanay Agrawal
Abid Ali
A. Dantcheva
François Brémond
195
0
0
04 Nov 2024
ZIM: Zero-Shot Image Matting for Anything
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim
Chanyong Shin
Joonhyun Jeong
Hyungsik Jung
Se Yun Lee
Sewhan Chun
Dong-Hyun Hwang
Joonsang Yu
VLM
230
7
0
01 Nov 2024
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using
  UAV Imagery
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery
Kangning Cui
Wei Tang
Rongkun Zhu
Manqi Wang
Gregory Larsen
...
Jordan Karubian
Raymond H. Chan
R. Plemmons
Jean-Michel Morel
Miles Silman
128
7
0
14 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A SurveyInternational Journal of Computer Vision (IJCV), 2024
Xiaorui Sun
Jing Liu
Jikang Cheng
Xiaofeng Zhu
Ping Hu
VLM
393
16
0
07 Oct 2024
System 2 Reasoning Capabilities Are Nigh
System 2 Reasoning Capabilities Are Nigh
Scott C. Lowe
VLMLRM
139
2
0
04 Oct 2024
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition
  Track)
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track)
Mingxu Feng
Dian Chao
Peng Zheng
Yang Yang
120
0
0
30 Sep 2024
1st Place Solution of Multiview Egocentric Hand Tracking Challenge
  ECCV2024
1st Place Solution of Multiview Egocentric Hand Tracking Challenge ECCV2024
Minqiang Zou
Zhi Lv
Riqiang Jin
Tian Zhan
Mochen Yu
Yao Tang
Jiajun Liang
EgoV
225
0
0
28 Sep 2024
Prithvi WxC: Foundation Model for Weather and Climate
Prithvi WxC: Foundation Model for Weather and Climate
J. Schmude
Sujit Roy
Will Trojak
Johannes Jakubik
Daniel Salles Civitarese
...
Campbell Watson
M. Maskey
Tsengdar J Lee
Juan Bernabé-Moreno
Rahul Ramachandran
VLMAI4Cl
258
20
0
20 Sep 2024
Mamba Fusion: Learning Actions Through Questioning
Mamba Fusion: Learning Actions Through Questioning
Zhikang Dong
Apoorva Beedu
Jason Sheinkopf
Irfan Essa
Mamba
328
7
0
17 Sep 2024
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild
Arya Farkhondeh
Samy Tafasca
J. Odobez
158
0
0
14 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music
  Videos
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
182
13
0
11 Sep 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
GenRec: Unifying Video Generation and Recognition with Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2024
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGenDiffM
264
13
0
27 Aug 2024
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and
  Medical Image Segmentation
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation
Xinyu Xiong
Zihuang Wu
Shuangyi Tan
Wenxue Li
Feilong Tang
Ying Chen
Siying Li
Jie Ma
Guanbin Li
VLM
172
74
0
16 Aug 2024
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models
Retrieval-augmented Few-shot Medical Image Segmentation with Foundation ModelsIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Lin Zhao
Xiao Chen
Eric Z. Chen
Yikang Liu
Terrence Chen
Shanhui Sun
VLM
232
18
0
16 Aug 2024
Novel adaptation of video segmentation to 3D MRI: efficient zero-shot
  knee segmentation with SAM2
Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2
A. Yu
Mohsen Hariri
Xuecen Zhang
Mingrui Yang
Vipin Chaudhary
Xiaojuan Li
VGenMedIm
86
6
0
08 Aug 2024
Path-SAM2: Transfer SAM2 for digital pathology semantic segmentation
Path-SAM2: Transfer SAM2 for digital pathology semantic segmentation
Mingya Zhang
Liang Wang
Zhihao Chen
Yiyuan Ge
Xianping Tao
VLMMedIm
162
6
0
07 Aug 2024
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context
  Relation Modeling
JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling
Seok Hwan Lee
Taein Son
Soo Won Seo
Jisong Kim
Jun Won Choi
215
1
0
07 Aug 2024
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Segment Anything in Medical Images and Videos: Benchmark and Deployment
Jun Ma
Sumin Kim
Feifei Li
Mohammed Baharoon
Reza Asakereh
Hongwei Lyu
Bo Wang
VLMMedIm
254
58
0
06 Aug 2024
YOWOv3: An Efficient and Generalized Framework for Human Action
  Detection and Recognition
YOWOv3: An Efficient and Generalized Framework for Human Action Detection and Recognition
Duc Manh Nguyen Dang
Viet-Hang Duong
Jia Ching Wang
Nhan Bui Duc
117
6
0
05 Aug 2024
SAM 2: Segment Anything in Images and Videos
SAM 2: Segment Anything in Images and VideosInternational Conference on Learning Representations (ICLR), 2024
Nikhila Ravi
Valentin Gabeur
Yuan-Ting Hu
Ronghang Hu
Chaitanya K. Ryali
...
Nicolas Carion
Chao-Yuan Wu
Ross B. Girshick
Piotr Dollár
Christoph Feichtenhofer
VLMMLLM
418
1,967
0
01 Aug 2024
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature
  Enhancement and Label Correlation Learning
Multi-label Sewer Pipe Defect Recognition with Mask Attention Feature Enhancement and Label Correlation Learning
Xin Zuo
Yu Sheng
Jifeng Shen
Yongwei Shan
138
0
0
01 Aug 2024
ADAPT: Multimodal Learning for Detecting Physiological Changes under
  Missing Modalities
ADAPT: Multimodal Learning for Detecting Physiological Changes under Missing Modalities
Julie Mordacq
Léo Milecki
Maria Vakalopoulou
Steve Oudot
Vicky Kalogeiton
OffRLMedIm
129
7
0
04 Jul 2024
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou
Xiaodan Du
G. Shakhnarovich
Karen Livescu
SLR
256
9
0
11 Jun 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Chau Pham
Bryan A. Plummer
193
7
0
26 May 2024
A Survey on Backbones for Deep Video Action Recognition
A Survey on Backbones for Deep Video Action Recognition
Zixuan Tang
Youjun Zhao
Yuhang Wen
Mengyuan Liu
144
2
0
09 May 2024
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic
  Activity Recognition
AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity RecognitionACM Multimedia (MM), 2024
Meiqi Cao
Rui Yan
Xiangbo Shu
Guangzhao Dai
Yazhou Yao
Guo-Sen Xie
193
1
0
04 May 2024
Training a high-performance retinal foundation model with half-the-data
  and 400 times less compute
Training a high-performance retinal foundation model with half-the-data and 400 times less compute
Justin Engelmann
Miguel O. Bernabeu
MedImOOD
332
5
0
30 Apr 2024
SFMViT: SlowFast Meet ViT in Chaotic World
SFMViT: SlowFast Meet ViT in Chaotic World
Jiaying Lin
Jiajun Wen
Mengyuan Liu
Jinfu Liu
Baiqiao Yin
Yue Li
ViT
138
1
0
25 Apr 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
216
5
0
18 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
221
71
0
01 Apr 2024
DailyMAE: Towards Pretraining Masked Autoencoders in One Day
DailyMAE: Towards Pretraining Masked Autoencoders in One Day
Jiantao Wu
Shentong Mo
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
178
4
0
31 Mar 2024
Benchmarks and Challenges in Pose Estimation for Egocentric Hand
  Interactions with Objects
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan
Takehiko Ohkawa
Linlin Yang
Nie Lin
Zhishan Zhou
...
Kun He
Yoichi Sato
Otmar Hilliges
Hyung Jin Chang
Angela Yao
208
29
0
25 Mar 2024
Enhancing Video Transformers for Action Understanding with VLM-aided
  Training
Enhancing Video Transformers for Action Understanding with VLM-aided Training
Hui Lu
Hu Jian
Ronald Poppe
A. A. Salah
166
5
0
24 Mar 2024
VidLA: Video-Language Alignment at Scale
VidLA: Video-Language Alignment at ScaleComputer Vision and Pattern Recognition (CVPR), 2024
Mamshad Nayeem Rizve
Fan Fei
Jayakrishnan Unnikrishnan
Son Tran
Benjamin Z. Yao
Belinda Zeng
Mubarak Shah
Trishul Chilimbi
VLMAI4TS
172
8
0
21 Mar 2024
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching
  for Video Anomaly Detection
MULDE: Multiscale Log-Density Estimation via Denoising Score Matching for Video Anomaly Detection
Jakub Micorek
Horst Possegger
Dominik Narnhofer
Horst Bischof
Mateusz Koziñski
213
22
0
21 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLMLRM
281
68
0
19 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
150
7
0
04 Mar 2024
Previous
1234
Next