Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.01771
Cited By
BlackMamba: Mixture of Experts for State-Space Models
1 February 2024
Quentin G. Anthony
Yury Tokpanov
Paolo Glorioso
Beren Millidge
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BlackMamba: Mixture of Experts for State-Space Models"
17 / 17 papers shown
Title
Continual Pre-training of MoEs: How robust is your router?
Benjamin Thérien
Charles-Étienne Joseph
Zain Sarwar
Ashwinee Panda
Anirban Das
Shi-Xiong Zhang
Stephen Rawls
Shri Kiran Srinivasan
Eugene Belilovsky
Irina Rish
MoE
80
0
0
06 Mar 2025
Zyda-2: a 5 Trillion Token High-Quality Dataset
Yury Tokpanov
Paolo Glorioso
Quentin Anthony
Beren Millidge
41
3
0
09 Nov 2024
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun
Zongxia Xie
Emadeldeen Eldele
Dongyue Chen
Q. Hu
Min-man Wu
AI4TS
28
1
0
13 Oct 2024
Understanding the Performance and Estimating the Cost of LLM Fine-Tuning
Yuchen Xia
Jiho Kim
Yuhan Chen
Haojie Ye
Souvik Kundu
Cong
Hao
Nishil Talati
MoE
37
22
0
08 Aug 2024
Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
Yuchen Zou
Yineng Chen
Zuchao Li
Lefei Zhang
Hai Zhao
55
1
0
24 Jun 2024
MambaLRP: Explaining Selective State Space Sequence Models
F. Jafari
G. Montavon
Klaus-Robert Müller
Oliver Eberle
Mamba
62
9
0
11 Jun 2024
Mamba as Decision Maker: Exploring Multi-scale Sequence Modeling in Offline Reinforcement Learning
Jiahang Cao
Qiang Zhang
Ziqing Wang
Jiaxu Wang
Hao Cheng
Yecheng Shao
Wen Zhao
Gang Han
Yijie Guo
Renjing Xu
Mamba
59
2
0
04 Jun 2024
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
30
45
0
26 May 2024
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He
Jiangning Zhang
Jinlong Peng
Haoyang He
Yabiao Wang
Chengjie Wang
3DPC
43
12
0
24 May 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
A Survey on Visual Mamba
Hanwei Zhang
Ying Zhu
Dan Wang
Lijun Zhang
Tianxiang Chen
Zi Ye
Mamba
40
56
0
24 Apr 2024
CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification
Guangqian Yang
Kangrui Du
Zhihan Yang
Ye Du
Yongping Zheng
Shujun Wang
42
16
0
25 Mar 2024
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Vincent Tao Hu
S. A. Baumann
Ming Gui
Olga Grebenkova
Pingchuan Ma
Johannes S. Fischer
Bjorn Ommer
42
42
0
20 Mar 2024
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
39
58
0
03 Mar 2024
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora
Sabri Eyuboglu
Aman Timalsina
Isys Johnson
Michael Poli
James Zou
Atri Rudra
Christopher Ré
64
66
0
08 Dec 2023
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
282
1,996
0
31 Dec 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,826
0
17 Sep 2019
1