Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.14062
Cited By
Scaling Vision Mamba Across Resolutions via Fractal Traversal
20 May 2025
Bo Li
Haoke Xiao
Lv Tang
Mamba
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Vision Mamba Across Resolutions via Fractal Traversal"
37 / 37 papers shown
Title
DefMamba: Deformable Visual State Space Model
Leiye Liu
Miao Zhang
Jihao Yin
Tingwei Liu
Wei Ji
Yongri Piao
Huchuan Lu
Mamba
109
1
0
08 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
121
5
0
03 Apr 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
136
40
0
26 Feb 2025
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
282
696
0
31 Dec 2024
Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion
Chaodong Xiao
Minghan Li
Zhengqiang Zhang
Deyu Meng
Lei Zhang
Mamba
122
5
0
19 Oct 2024
Scalable Autoregressive Image Generation with Mamba
Haopeng Li
Jinyue Yang
Kexin Wang
Xuerui Qiu
Yuhong Chou
Xin Li
Guoqi Li
Mamba
125
15
0
22 Aug 2024
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Ali Hatamizadeh
Jan Kautz
Mamba
129
70
0
10 Jul 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
114
532
0
31 May 2024
Mamba-R: Vision Mamba ALSO Needs Registers
Feng Wang
Jiahao Wang
Sucheng Ren
Guoyizhe Wei
J. Mei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
Mamba
67
23
0
23 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
113
637
0
25 Apr 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
87
96
0
26 Mar 2024
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference
Han Zhao
Min Zhang
Wei Zhao
Pengxiang Ding
Siteng Huang
Donglin Wang
Mamba
101
74
0
21 Mar 2024
LocalMamba: Visual State Space Model with Windowed Selective Scan
Tao Huang
Xiaohuan Pei
Shan You
Fei Wang
Chao Qian
Chang Xu
Mamba
84
151
0
14 Mar 2024
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
567
4,910
0
17 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.5K
14,699
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
429
4,641
0
30 Jan 2023
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
79
35
0
01 Dec 2022
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
127
414
0
14 Apr 2022
PaLM: Scaling Language Modeling with Pathways
Aakanksha Chowdhery
Sharan Narang
Jacob Devlin
Maarten Bosma
Gaurav Mishra
...
Kathy Meier-Hellstern
Douglas Eck
J. Dean
Slav Petrov
Noah Fiedel
PILM
LRM
515
6,293
0
05 Apr 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
549
4,409
0
28 Jan 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
217
1,814
0
31 Oct 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
284
2,521
0
20 Apr 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
463
21,564
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
389
6,802
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
670
41,369
0
22 Oct 2020
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
102
1,692
0
30 Mar 2020
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
459
20,317
0
23 Oct 2019
MMDetection: Open MMLab Detection Toolbox and Benchmark
Kai-xiang Chen
Jiaqi Wang
Jiangmiao Pang
Yuhang Cao
Yu Xiong
...
Jingdong Wang
Jianping Shi
Wanli Ouyang
Chen Change Loy
Dahua Lin
VOS
169
2,871
0
17 Jun 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
730
132,363
0
12 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.2K
20,880
0
17 Apr 2017
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
360
27,244
0
20 Mar 2017
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
404
1,889
0
18 Aug 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.7K
100,508
0
04 Sep 2014
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
424
43,814
0
01 May 2014
1