ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.14062
  4. Cited By
Scaling Vision Mamba Across Resolutions via Fractal Traversal

Scaling Vision Mamba Across Resolutions via Fractal Traversal

20 May 2025
Bo Li
Haoke Xiao
Lv Tang
    Mamba
ArXiv (abs)PDFHTML

Papers citing "Scaling Vision Mamba Across Resolutions via Fractal Traversal"

37 / 37 papers shown
Title
DefMamba: Deformable Visual State Space Model
DefMamba: Deformable Visual State Space Model
Leiye Liu
Miao Zhang
Jihao Yin
Tingwei Liu
Wei Ji
Yongri Piao
Huchuan Lu
Mamba
111
1
0
08 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
121
5
0
03 Apr 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRMLM&MAVLM
136
40
0
26 Feb 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
285
696
0
31 Dec 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Chaodong Xiao
Minghan Li
Zhengqiang Zhang
Deyu Meng
Lei Zhang
Mamba
122
5
0
19 Oct 2024
Scalable Autoregressive Image Generation with Mamba
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
125
15
0
22 Aug 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
129
70
0
10 Jul 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms
  Through Structured State Space Duality
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
114
532
0
31 May 2024
Mamba-R: Vision Mamba ALSO Needs Registers
Mamba-R: Vision Mamba ALSO Needs Registers
Feng Wang
Jiahao Wang
Sucheng Ren
Guoyizhe Wei
J. Mei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
Mamba
67
23
0
23 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal
  Models with Open-Source Suites
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLMVLM
113
637
0
25 Apr 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
87
96
0
26 Mar 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao
Min Zhang
Wei Zhao
Pengxiang Ding
Siteng Huang
Donglin Wang
Mamba
101
74
0
21 Mar 2024
LocalMamba: Visual State Space Model with Windowed Selective Scan
LocalMamba: Visual State Space Model with Windowed Selective Scan
Tao Huang
Xiaohuan Pei
Shan You
Fei Wang
Chao Qian
Chang Xu
Mamba
84
151
0
14 Mar 2024
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
1.5K
14,699
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
429
4,641
0
30 Jan 2023
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
79
35
0
01 Dec 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
127
414
0
14 Apr 2022
PaLM: Scaling Language Modeling with Pathways
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILMLRM
517
6,293
0
05 Apr 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
552
4,409
0
28 Jan 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
217
1,821
0
31 Oct 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
463
21,564
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
967
29,810
0
26 Feb 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,802
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,430
0
22 Oct 2020
Designing Network Design Spaces
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
102
1,692
0
30 Mar 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
462
20,317
0
23 Oct 2019
MMDetection: Open MMLab Detection Toolbox and Benchmark
MMDetection: Open MMLab Detection Toolbox and Benchmark
Kai-xiang Chen
Jiaqi Wang
Jiangmiao Pang
Yuhang Cao
Yu Xiong
...
Jingdong Wang
Jianping Shi
Wanli Ouyang
Chen Change Loy
Dahua Lin
VOS
169
2,878
0
17 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.2K
20,880
0
17 Apr 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
360
27,244
0
20 Mar 2017
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
404
1,890
0
18 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAttMDE
1.7K
100,508
0
04 Sep 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1