Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.06446
Cited By
M
3
^3
3
amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification
9 March 2025
Mingxiang Cao
Weiying Xie
Xin Zhang
Jiaqing Zhang
Kai Jiang
Jie Lei
Yunsong Li
Mamba
Re-assign community
ArXiv
PDF
HTML
Papers citing
"M$^3$amba: CLIP-driven Mamba Model for Multi-modal Remote Sensing Classification"
16 / 16 papers shown
Title
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
228
675
0
31 Dec 2024
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
Xinyu Xie
Yawen Cui
Chio-in Ieong
Tao Tan
Xiaozhi Zhang
Mamba
91
44
0
15 Apr 2024
FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients
Daixun Li
Weiying Xie
Zixuan Wang
YiBing Lu
Yunsong Li
Leyuan Fang
FedML
DiffM
95
20
0
16 Nov 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
43
29
0
28 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
401
4,508
0
30 Jan 2023
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
210
163
0
28 Dec 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
317
6,827
0
13 Apr 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
414
7,697
0
11 Nov 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
225
1,033
0
09 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
87
293
0
06 Oct 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
69
597
0
10 Jun 2021
Multimodal Remote Sensing Benchmark Datasets for Land Cover Classification with A Shared and Specific Feature Learning Model
Danfeng Hong
Jingliang Hu
Jing Yao
Jocelyn Chanussot
Xiaoxiang Zhu
36
217
0
21 May 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
64
1,468
0
27 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
784
29,127
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
409
3,824
0
11 Feb 2021
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.3K
94,479
0
11 Oct 2018
1