ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15130
  4. Cited By
TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba

24 February 2025
Xiuwei Chen
Sihao Lin
Xiao Dong
Zhenpeng Chen
Meng Cao
Jiawei Han
Hang Xu
Xiaodan Liang
    Mamba
ArXivPDFHTML

Papers citing "TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba"

48 / 48 papers shown
Title
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
128
4
0
24 Feb 2025
VMamba: Visual State Space Model
VMamba: Visual State Space Model
Yue Liu
Yunjie Tian
Yuzhong Zhao
Hongtian Yu
Lingxi Xie
Yaowei Wang
Qixiang Ye
Jianbin Jiao
Yunfan Liu
Mamba
195
657
0
31 Dec 2024
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Falcon Mamba: The First Competitive Attention-free 7B Language Model
Jingwei Zuo
Maksim Velikanov
Dhia Eddine Rhaiem
Ilyas Chahed
Younes Belkada
Guillaume Kunsch
Hakim Hacid
ALM
60
15
0
07 Oct 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
57
32
0
22 Aug 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
78
27
0
19 Aug 2024
Aligning in a Compact Space: Contrastive Knowledge Distillation between
  Heterogeneous Architectures
Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures
Hongjun Wu
Li Xiao
Xingkuo Zhang
Yining Miao
84
1
0
28 May 2024
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
DVMSR: Distillated Vision Mamba for Efficient Super-Resolution
Xiaoyan Lei
Wenlong Zhang
Weifeng Cao
49
13
0
05 May 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
65
91
0
26 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
57
44
0
20 Mar 2024
VL-Mamba: Exploring State Space Models for Multimodal Learning
VL-Mamba: Exploring State Space Models for Multimodal Learning
Yanyuan Qiao
Zheng Yu
Longteng Guo
Sihan Chen
Zijia Zhao
Mingzhen Sun
Qi Wu
Jing Liu
Mamba
43
68
0
20 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
49
192
0
11 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
54
44
0
04 Mar 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
66
736
0
17 Jan 2024
U-Mamba: Enhancing Long-range Dependency for Biomedical Image
  Segmentation
U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation
Jun Ma
Feifei Li
Bo Wang
Mamba
102
350
0
09 Jan 2024
Weight subcloning: direct initialization of transformers using larger
  pretrained ones
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
63
27
0
14 Dec 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
58
2,552
0
01 Dec 2023
One-for-All: Bridge the Gap Between Heterogeneous Architectures in
  Knowledge Distillation
One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation
Zhiwei Hao
Jianyuan Guo
Kai Han
Yehui Tang
Han Hu
Yunhe Wang
Chang Xu
71
60
0
30 Oct 2023
PixArt-$α$: Fast Training of Diffusion Transformer for
  Photorealistic Text-to-Image Synthesis
PixArt-ααα: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Junsong Chen
Jincheng Yu
Chongjian Ge
Lewei Yao
Enze Xie
...
Zhongdao Wang
James T. Kwok
Ping Luo
Huchuan Lu
Zhenguo Li
DiffM
70
414
0
30 Sep 2023
Dynamic Residual Classifier for Class Incremental Learning
Dynamic Residual Classifier for Class Incremental Learning
Xiu-yan Chen
Xiaobin Chang
44
17
0
25 Aug 2023
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language
  Models
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu
Peixian Chen
Yunhang Shen
Yulei Qin
Mengdan Zhang
...
Xiawu Zheng
Ke Li
Xing Sun
Zhenyu Qiu
Rongrong Ji
ELM
MLLM
55
806
0
23 Jun 2023
RWKV: Reinventing RNNs for the Transformer Era
RWKV: Reinventing RNNs for the Transformer Era
Bo Peng
Eric Alcaide
Quentin G. Anthony
Alon Albalak
Samuel Arcadinho
...
Qihang Zhao
P. Zhou
Qinghua Zhou
Jian Zhu
Rui-Jie Zhu
158
581
0
22 May 2023
Semantic Segmentation using Vision Transformers: A survey
Semantic Segmentation using Vision Transformers: A survey
Hans Thisanke
Chamli Deshan
K. Chamith
Sachith Seneviratne
Rajith Vidanaarachchi
Damayanthi Herath
ViT
47
150
0
05 May 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
332
4,506
0
17 Apr 2023
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Hungry Hungry Hippos: Towards Language Modeling with State Space Models
Daniel Y. Fu
Tri Dao
Khaled Kamal Saab
A. Thomas
Atri Rudra
Christopher Ré
80
383
0
28 Dec 2022
Cross-Architecture Knowledge Distillation
Cross-Architecture Knowledge Distillation
Yufan Liu
Jiajiong Cao
Bing Li
Weiming Hu
Jin-Fei Ding
Liang Li
25
41
0
12 Jul 2022
Long Range Language Modeling via Gated State Spaces
Long Range Language Modeling via Gated State Spaces
Harsh Mehta
Ankit Gupta
Ashok Cutkosky
Behnam Neyshabur
Mamba
54
234
0
27 Jun 2022
How to Train Your HiPPO: State Space Models with Generalized Orthogonal
  Basis Projections
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections
Albert Gu
Isys Johnson
Aman Timalsina
Atri Rudra
Christopher Ré
Mamba
123
93
0
24 Jun 2022
On the Parameterization and Initialization of Diagonal State Space
  Models
On the Parameterization and Initialization of Diagonal State Space Models
Albert Gu
Ankit Gupta
Karan Goel
Christopher Ré
51
308
0
23 Jun 2022
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
VLM
35
29
0
13 Mar 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
162
1,719
0
31 Oct 2021
On Pursuit of Designing Multi-modal Transformer for Video Grounding
On Pursuit of Designing Multi-modal Transformer for Video Grounding
Meng Cao
Long Chen
Mike Zheng Shou
Can Zhang
Yuexian Zou
47
81
0
13 Sep 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
121
969
0
01 Jul 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
91
1,180
0
09 Jun 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
288
21,051
0
25 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
65
398
0
23 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
666
28,659
0
26 Feb 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
273
6,657
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
314
40,217
0
22 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
461
41,106
0
28 May 2020
Transformer to CNN: Label-scarce distillation for efficient text
  classification
Transformer to CNN: Label-scarce distillation for efficient text classification
Yew Ken Chia
Sam Witteveen
Martin Andrews
26
37
0
08 Sep 2019
Object Detection in 20 Years: A Survey
Object Detection in 20 Years: A Survey
Zhengxia Zou
Keyan Chen
Zhenwei Shi
Yuhong Guo
Jieping Ye
VLM
ObjD
AI4TS
79
2,333
0
13 May 2019
Towards VQA Models That Can Read
Towards VQA Models That Can Read
Amanpreet Singh
Vivek Natarajan
Meet Shah
Yu Jiang
Xinlei Chen
Dhruv Batra
Devi Parikh
Marcus Rohrbach
EgoV
50
1,174
0
18 Apr 2019
VizWiz Grand Challenge: Answering Visual Questions from Blind People
VizWiz Grand Challenge: Answering Visual Questions from Blind People
Danna Gurari
Qing Li
Abigale Stangl
Anhong Guo
Chi Lin
Kristen Grauman
Jiebo Luo
Jeffrey P. Bigham
CoGe
66
831
0
22 Feb 2018
Localizing Moments in Video with Natural Language
Localizing Moments in Video with Natural Language
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
87
933
0
04 Aug 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
427
129,831
0
12 Jun 2017
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.3K
192,638
0
10 Dec 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
860
99,991
0
04 Sep 2014
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.0K
39,383
0
01 Sep 2014
1