ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.19887
  4. Cited By
Jamba: A Hybrid Transformer-Mamba Language Model

Jamba: A Hybrid Transformer-Mamba Language Model

28 March 2024
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
Itay Dalmedigos
Erez Safahi
S. Meirom
Yonatan Belinkov
Shai Shalev-Shwartz
Omri Abend
Raz Alon
Tomer Asida
Amir Bergman
Roman Glozman
Michael Gokhman
Avshalom Manevich
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
ArXivPDFHTML

Papers citing "Jamba: A Hybrid Transformer-Mamba Language Model"

49 / 49 papers shown
Title
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Recall with Reasoning: Chain-of-Thought Distillation for Mamba's Long-Context Memory and Extrapolation
Junyu Ma
Tianqing Fang
Z. Zhang
Hongming Zhang
Haitao Mi
Dong Yu
ReLM
RALM
LRM
142
0
0
06 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
99
1
0
01 May 2025
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
SSD-Poser: Avatar Pose Estimation with State Space Duality from Sparse Observations
Shuting Zhao
Linxin Bai
Liangjing Shao
Ye Zhang
Xinrong Chen
28
0
0
25 Apr 2025
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Quamba2: A Robust and Scalable Post-training Quantization Framework for Selective State Space Models
Hung-Yueh Chiang
Chi-chih Chang
N. Frumkin
Kai-Chiang Wu
Mohamed S. Abdelfattah
Diana Marculescu
MQ
146
0
0
28 Mar 2025
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models
Shriyank Somvanshi
Md Monzurul Islam
Mahmuda Sultana Mimi
Sazzad Bin Bashar Polock
Gaurab Chhetri
Subasish Das
Mamba
AI4TS
45
0
0
22 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
63
1
0
18 Mar 2025
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications
Siyuan Mu
Sen Lin
MoE
135
2
0
10 Mar 2025
MoFE: Mixture of Frozen Experts Architecture
Jean Seo
Jaeyoon Kim
Hyopil Shin
MoE
167
0
0
09 Mar 2025
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
Weigao Sun
Disen Lan
Tong Zhu
Xiaoye Qu
Yu-Xi Cheng
MoE
103
2
0
07 Mar 2025
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory
Jiashun Suo
Xiaojian Liao
Limin Xiao
Li Ruan
Jinquan Wang
Xiao Su
Zhisheng Huo
67
0
0
04 Mar 2025
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan
Weigao Sun
Jiaxi Hu
Jusen Du
Yu-Xi Cheng
69
0
0
03 Mar 2025
PICASO: Permutation-Invariant Context Composition with State Space Models
PICASO: Permutation-Invariant Context Composition with State Space Models
Tian Yu Liu
Alessandro Achille
Matthew Trager
Aditya Golatkar
L. Zancato
Stefano Soatto
LRM
62
0
0
24 Feb 2025
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
RT-DEMT: A hybrid real-time acupoint detection model combining mamba and transformer
Shilong Yang
Qi Zang
Chulong Zhang
Lingfeng Huang
Yaoqin Xie
Mamba
74
1
0
16 Feb 2025
SS4Rec: Continuous-Time Sequential Recommendation with State Space Models
SS4Rec: Continuous-Time Sequential Recommendation with State Space Models
Wei Xiao
Huiying Wang
Qifeng Zhou
59
0
0
12 Feb 2025
Multilingual State Space Models for Structured Question Answering in Indic Languages
Multilingual State Space Models for Structured Question Answering in Indic Languages
A. Vats
Rahul Raja
Mrinal Mathur
Vinija Jain
Aman Chadha
70
1
0
01 Feb 2025
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
Xin He
Yixuan Wang
Wenqi Fan
Xu Shen
Xin Juan
Rui Miao
Xin Wang
70
0
0
26 Jan 2025
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Automatic selection of the best neural architecture for time series forecasting via multi-objective optimization and Pareto optimality conditions
Qianying Cao
Shanqing Liu
Alan John Varghese
Jérome Darbon
M. Triantafyllou
George Karniadakis
AI4TS
169
0
0
21 Jan 2025
SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation
SSD4Rec: A Structured State Space Duality Model for Efficient Sequential Recommendation
Haohao Qu
Yifeng Zhang
Liangbo Ning
Wenqi Fan
Qing Li
Mamba
102
7
0
17 Jan 2025
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang
Zexiang Xu
Desai Xie
Z. Chen
Haian Jin
...
Xin Sun
Jiuxiang Gu
Qixing Huang
Georgios Pavlakos
Hao Tan
159
1
0
18 Dec 2024
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs
Michael Wornow
Suhana Bedi
Miguel Angel Fuentes Hernandez
E. Steinberg
Jason Alan Fries
Christopher Ré
Sanmi Koyejo
N. Shah
95
4
0
09 Dec 2024
Marconi: Prefix Caching for the Era of Hybrid LLMs
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
95
4
0
28 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
64
4
0
06 Nov 2024
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
ShadowMamba: State-Space Model with Boundary-Region Selective Scan for Shadow Removal
Xiujin Zhu
Chee-Onn Chow
Joon Huang Chuah
Mamba
45
0
0
05 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
67
5
0
22 Oct 2024
Towards Neural Scaling Laws for Time Series Foundation Models
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao
Chao-Han Huck Yang
Renhe Jiang
Yuxuan Liang
Ming Jin
Shirui Pan
AI4TS
AI4CE
42
7
0
16 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Cheng Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
37
4
0
14 Oct 2024
GlobalMamba: Global Image Serialization for Vision Mamba
GlobalMamba: Global Image Serialization for Vision Mamba
Chengkun Wang
Wenzhao Zheng
Jie Zhou
Jiwen Lu
Mamba
40
0
0
14 Oct 2024
TULIP: Token-length Upgraded CLIP
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
Nanne van Noord
Marcel Worring
Cees G. M. Snoek
VLM
48
3
0
13 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
31
4
0
11 Oct 2024
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Rodimus*: Breaking the Accuracy-Efficiency Trade-Off with Efficient Attentions
Zhihao He
Hang Yu
Zi Gong
Shizhan Liu
J. Li
Weiyao Lin
VLM
38
1
0
09 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
16
0
06 Oct 2024
How to Train Long-Context Language Models (Effectively)
How to Train Long-Context Language Models (Effectively)
Tianyu Gao
Alexander Wettig
Howard Yen
Danqi Chen
RALM
72
38
0
03 Oct 2024
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu
Li Yi
Mamba
45
2
0
01 Oct 2024
DocMamba: Efficient Document Pre-training with State Space Model
DocMamba: Efficient Document Pre-training with State Space Model
Pengfei Hu
Zhenrong Zhang
Jiefeng Ma
Shuhang Liu
Jun Du
Jianshu Zhang
Mamba
42
1
0
18 Sep 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
72
1
0
16 Sep 2024
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Aviv Bick
Kevin Y. Li
Eric P. Xing
J. Zico Kolter
Albert Gu
Mamba
53
24
0
19 Aug 2024
Mambular: A Sequential Model for Tabular Deep Learning
Mambular: A Sequential Model for Tabular Deep Learning
Anton Thielmann
Manish Kumar
Christoph Weisser
Arik Reuter
Benjamin Säfken
Soheila Samiee
Mamba
LMTD
73
6
0
12 Aug 2024
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning
  Mamba
Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba
Haoye Dong
Aviral Chharia
Wenbo Gou
Francisco Vicente Carrasco
Fernando de la Torre
Mamba
51
2
0
12 Jul 2024
KV Cache Compression, But What Must We Give in Return? A Comprehensive
  Benchmark of Long Context Capable Approaches
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
Jiayi Yuan
Hongyi Liu
Shaochen
Zhong
Yu-Neng Chuang
...
Hongye Jin
V. Chaudhary
Zhaozhuo Xu
Zirui Liu
Xia Hu
43
17
0
01 Jul 2024
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging
Deyuan Liu
Zhanyue Qin
Hairu Wang
Zhao Yang
Zecheng Wang
...
Zhao Lv
Zhiying Tu
Dianhui Chu
Bo Li
Dianbo Sui
22
2
0
24 Jun 2024
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish
Itamar Zimerman
Shady Abu Hussein
Nadav Cohen
Amir Globerson
Lior Wolf
Raja Giryes
Mamba
77
13
0
20 Jun 2024
BABILong: Testing the Limits of LLMs with Long Context
  Reasoning-in-a-Haystack
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack
Yuri Kuratov
Aydar Bulatov
Petr Anokhin
Ivan Rodkin
Dmitry Sorokin
Artyom Sorokin
Mikhail Burtsev
RALM
ALM
LRM
ReLM
ELM
49
59
0
14 Jun 2024
An Empirical Study of Mamba-based Language Models
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
61
65
0
12 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral
  Image Denoising
SSUMamba: Spatial-Spectral Selective State Space Model for Hyperspectral Image Denoising
Guanyiman Fu
Fengchao Xiong
Jianfeng Lu
Jun Zhou
Mamba
37
19
0
02 May 2024
Linear Attention Sequence Parallelism
Linear Attention Sequence Parallelism
Weigao Sun
Zhen Qin
Dong Li
Xuyang Shen
Yu Qiao
Yiran Zhong
70
2
0
03 Apr 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Efficient Intent Detection with Dual Sentence Encoders
Efficient Intent Detection with Dual Sentence Encoders
I. Casanueva
Tadas Temvcinas
D. Gerz
Matthew Henderson
Ivan Vulić
VLM
180
453
0
10 Mar 2020
1