Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.19887
Cited By
Jamba: A Hybrid Transformer-Mamba Language Model
28 March 2024
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
Itay Dalmedigos
Erez Safahi
S. Meirom
Yonatan Belinkov
Shai Shalev-Shwartz
Omri Abend
Raz Alon
Tomer Asida
Amir Bergman
Roman Glozman
Michael Gokhman
Avshalom Manevich
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Jamba: A Hybrid Transformer-Mamba Language Model"
49 / 49 papers shown
Title
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Junyu Ma
Tianqing Fang
Z. Zhang
Hongming Zhang
Haitao Mi
Dong Yu
ReLM
RALM
LRM
142
0
0
06 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
99
1
0
01 May 2025
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
Shuting Zhao
Linxin Bai
Liangjing Shao
Ye Zhang
Xinrong Chen
28
0
0
25 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
146
0
0
28 Mar 2025
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi
Md Monzurul Islam
Mahmuda Sultana Mimi
Sazzad Bin Bashar Polock
Gaurab Chhetri
Subasish Das
Mamba
AI4TS
45
0
0
22 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
63
1
0
18 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
135
2
0
10 Mar 2025
MoFE: Mixture of Frozen Experts Architecture
Jean Seo
Jaeyoon Kim
Hyopil Shin
MoE
167
0
0
09 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
67
0
0
04 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
69
0
0
03 Mar 2025
PICASO: Permutation-Invariant Context Composition with State Space Models
Tian Yu Liu
Alessandro Achille
Matthew Trager
Aditya Golatkar
L. Zancato
Stefano Soatto
LRM
62
0
0
24 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
74
1
0
16 Feb 2025
SS4Rec: Continuous-Time Sequential Recommendation with State Space Models
Wei Xiao
Huiying Wang
Qifeng Zhou
59
0
0
12 Feb 2025
Multilingual State Space Models for Structured Question Answering in Indic Languages
A. Vats
Rahul Raja
Mrinal Mathur
Vinija Jain
Aman Chadha
70
1
0
01 Feb 2025
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
Xin He
Yixuan Wang
Wenqi Fan
Xu Shen
Xin Juan
Rui Miao
Xin Wang
70
0
0
26 Jan 2025
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Qianying Cao
Shanqing Liu
Alan John Varghese
Jérome Darbon
M. Triantafyllou
George Karniadakis
AI4TS
169
0
0
21 Jan 2025
SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation
Haohao Qu
Yifeng Zhang
Liangbo Ning
Wenqi Fan
Qing Li
Mamba
102
7
0
17 Jan 2025
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang
Zexiang Xu
Desai Xie
Z. Chen
Haian Jin
...
Xin Sun
Jiuxiang Gu
Qixing Huang
Georgios Pavlakos
Hao Tan
159
1
0
18 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
95
4
0
09 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
95
4
0
28 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
64
4
0
06 Nov 2024
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
Xiujin Zhu
Chee-Onn Chow
Joon Huang Chuah
Mamba
45
0
0
05 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
67
5
0
22 Oct 2024
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao
Chao-Han Huck Yang
Renhe Jiang
Yuxuan Liang
Ming Jin
Shirui Pan
AI4TS
AI4CE
42
7
0
16 Oct 2024
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Cheng Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
GlobalMamba: Global Image Serialization for Vision Mamba
Chengkun Wang
Wenzhao Zheng
Jie Zhou
Jiwen Lu
Mamba
40
0
0
14 Oct 2024
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
48
3
0
13 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
31
4
0
11 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu
Li Yi
Mamba
45
2
0
01 Oct 2024
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
42
1
0
18 Sep 2024
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
72
1
0
16 Sep 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
53
24
0
19 Aug 2024
Mambular: A Sequential Model for Tabular Deep Learning
Anton Thielmann
Manish Kumar
Christoph Weisser
Arik Reuter
Benjamin Säfken
Soheila Samiee
Mamba
LMTD
73
6
0
12 Aug 2024
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Haoye Dong
Aviral Chharia
Wenbo Gou
Francisco Vicente Carrasco
Fernando de la Torre
Mamba
51
2
0
12 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
43
17
0
01 Jul 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhanyue Qin
Hairu Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
22
2
0
24 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
77
13
0
20 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
49
59
0
14 Jun 2024
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
61
65
0
12 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Mamba
37
19
0
02 May 2024
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
70
2
0
03 Apr 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
453
0
10 Mar 2020
1