Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.04248
Cited By
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks
6 February 2024
Jongho Park
Jaeseung Park
Zheyang Xiong
Nayoung Lee
Jaewoong Cho
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks"
50 / 52 papers shown
Title
Turbo-ICL: In-Context Learning-Based Turbo Equalization
Zihang Song
Matteo Zecchin
Bipin Rajendran
Osvaldo Simeone
39
0
0
09 May 2025
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
Piotr Piekos
Róbert Csordás
Jürgen Schmidhuber
MoE
VLM
96
1
0
01 May 2025
Understanding the Skill Gap in Recurrent Language Models: The Role of the Gather-and-Aggregate Mechanism
Aviv Bick
Eric P. Xing
Albert Gu
RALM
88
0
0
22 Apr 2025
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning
Yingcong Li
Davoud Ataee Tarzanagh
A. S. Rawat
Maryam Fazel
Samet Oymak
25
0
0
06 Apr 2025
Attention Mamba: Time Series Modeling with Adaptive Pooling Acceleration and Receptive Field Enhancements
Sijie Xiong
Shuqing Liu
Cheng Tang
Fumiya Okubo
Haoling Xiong
Atsushi Shimada
Mamba
AI4TS
60
0
0
02 Apr 2025
Resona: Improving Context Copying in Linear Recurrence Models with Retrieval
Xihuai Wang
Linrui Ma
Jerry Huang
Peng Lu
Prasanna Parthasarathi
Xiao-Wen Chang
Boxing Chen
Yufei Cui
KELM
45
1
0
28 Mar 2025
VideoMAP: Toward Scalable Mamba-based Video Autoregressive Pretraining
Yunze Liu
Peiran Wu
C. Liang
Junxiao Shen
Limin Wang
Li Yi
Mamba
53
0
0
16 Mar 2025
In-Context Learning with Hypothesis-Class Guidance
Ziqian Lin
Shubham Kumar Bharti
Kangwook Lee
76
0
0
27 Feb 2025
Merging Context Clustering with Visual State Space Models for Medical Image Segmentation
Yun Zhu
Dong Zhang
Yi-Mou Lin
Yifei Feng
Jinhui Tang
Mamba
27
1
0
03 Jan 2025
Marconi: Prefix Caching for the Era of Hybrid LLMs
Rui Pan
Zhuang Wang
Zhen Jia
Can Karakus
Luca Zancato
Tri Dao
Ravi Netravali
Yida Wang
95
4
0
28 Nov 2024
Hymba: A Hybrid-head Architecture for Small Language Models
Xin Dong
Y. Fu
Shizhe Diao
Wonmin Byeon
Zijia Chen
...
Min-Hung Chen
Yoshi Suhara
Y. Lin
Jan Kautz
Pavlo Molchanov
Mamba
100
21
0
20 Nov 2024
Can Custom Models Learn In-Context? An Exploration of Hybrid Architecture Performance on In-Context Learning Tasks
Ryan Campbell
Nelson Lojo
Kesava Viswanadha
Christoffer Grondal Tryggestad
Derrick Han Sun
Sriteja Vijapurapu
August Rolfsen
Anant Sahai
31
0
0
06 Nov 2024
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Hao Phung
Quan Dao
T. Dao
Hoang Phan
Dimitris Metaxas
Anh Tran
Mamba
64
4
0
06 Nov 2024
Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Wangjie You
Zecheng Tang
Juntao Li
Lili Yao
Min Zhang
Mamba
24
0
0
21 Oct 2024
Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models
Sathya Kamesh Bhethanabhotla
Omar Swelam
Julien N. Siems
David Salinas
Frank Hutter
Mamba
AI4TS
AI4CE
43
3
0
12 Oct 2024
Parameter-Efficient Fine-Tuning of State Space Models
Kevin Galim
Wonjun Kang
Yuchen Zeng
H. Koo
Kangwook Lee
29
4
0
11 Oct 2024
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models
Cong Guo
Feng Cheng
Zhixu Du
James Kiessling
Jonathan Ku
...
Qilin Zheng
Guanglei Zhou
Hai
Li-Wei Li
Yiran Chen
31
7
0
08 Oct 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong
Ziyang Cai
John Cooper
Albert Ge
Vasilis Papageorgiou
...
Saurabh Agarwal
Grigorios G Chrysos
Samet Oymak
Kangwook Lee
Dimitris Papailiopoulos
LRM
35
1
0
08 Oct 2024
Task Diversity Shortens the ICL Plateau
Jaeyeon Kim
Sehyun Kwon
Joo Young Choi
Jongho Park
Jaewoong Cho
Jason D. Lee
Ernest K. Ryu
MoMe
31
2
0
07 Oct 2024
Can Mamba Always Enjoy the "Free Lunch"?
Ruifeng Ren
Zhicong Li
Yong Liu
44
1
0
04 Oct 2024
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li
Meng Wang
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
LRM
27
5
0
03 Oct 2024
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge
Xihui Lin
Yunan Zhang
Jiawei Han
Hao Peng
31
4
0
02 Oct 2024
Mitigating Copy Bias in In-Context Learning through Neuron Pruning
Ameen Ali
Lior Wolf
Ivan Titov
36
2
0
02 Oct 2024
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics
Wenqing Zhang
Junming Huang
Ruotong Wang
Changsong Wei
Wenqian Huang
Yuxin Qiao
Mamba
32
10
0
13 Sep 2024
Shaking Up VLMs: Comparing Transformers and Structured State Space Models for Vision & Language Modeling
Georgios Pantazopoulos
Malvina Nikandrou
Alessandro Suglia
Oliver Lemon
Arash Eshghi
Mamba
45
1
0
09 Sep 2024
DualKanbaFormer: An Efficient Selective Sparse Framework for Multimodal Aspect-based Sentiment Analysis
A. Lawan
Juhua Pu
Haruna Yunusa
Muhammad Lawan
Aliyu Umar
Adamu Sani Yahya
Mahmoud Basi
Mamba
39
0
0
27 Aug 2024
Simplified Mamba with Disentangled Dependency Encoding for Long-Term Time Series Forecasting
Zixuan Weng
Jindong Han
Wenzhao Jiang
Hao Liu
Mamba
AI4TS
33
2
0
22 Aug 2024
DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs
Dongyuan Li
Shiyin Tan
Ying Zhang
Ming Jin
Shirui Pan
Manabu Okumura
Renhe Jiang
Mamba
34
3
0
13 Aug 2024
A Survey of Mamba
Shuwei Shi
Shibing Chu
Rui An
Wenqi Fan
Yuee Xie
Hui Liu
Yuanping Chen
Qing Li
AI4CE
40
26
0
02 Aug 2024
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond
Yingcong Li
A. S. Rawat
Samet Oymak
25
6
0
13 Jul 2024
How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities
Jerry Huang
52
7
0
11 Jul 2024
On the Power of Convolution Augmented Transformer
Mingchen Li
Xuechen Zhang
Yixiao Huang
Samet Oymak
32
0
0
08 Jul 2024
Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
Yuchen Zou
Yineng Chen
Zuchao Li
Lefei Zhang
Hai Zhao
52
1
0
24 Jun 2024
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin
Ilya Zisman
Alexey Zemtsov
Viacheslav Sinii
107
4
0
13 Jun 2024
An Empirical Study of Mamba-based Language Models
R. Waleffe
Wonmin Byeon
Duncan Riach
Brandon Norick
V. Korthikanti
...
Vartika Singh
Jared Casper
Jan Kautz
M. Shoeybi
Bryan Catanzaro
61
64
0
12 Jun 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
74
56
0
11 Jun 2024
Dimba: Transformer-Mamba Diffusion Models
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Debang Li
Youqiang Zhang
Junshi Huang
Mamba
62
16
0
03 Jun 2024
Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso
Quentin G. Anthony
Yury Tokpanov
James Whittington
Jonathan Pilault
Adam Ibrahim
Beren Millidge
30
45
0
26 May 2024
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
Zicheng Wang
Zhen Chen
Yiming Wu
Zhen Zhao
Luping Zhou
Dong Xu
Mamba
45
14
0
24 May 2024
PointRWKV: Efficient RWKV-Like Model for Hierarchical Point Cloud Learning
Qingdong He
Jiangning Zhang
Jinlong Peng
Haoyang He
Yabiao Wang
Chengjie Wang
3DPC
40
12
0
24 May 2024
Visual Mamba: A Survey and New Outlooks
Rui Xu
Shu Yang
Yihui Wang
Yu Cai
Bo Du
Hao Chen
Mamba
42
26
0
29 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
A Survey on Efficient Inference for Large Language Models
Zixuan Zhou
Xuefei Ning
Ke Hong
Tianyu Fu
Jiaming Xu
...
Shengen Yan
Guohao Dai
Xiao-Ping Zhang
Yuhan Dong
Yu-Xiang Wang
46
83
0
22 Apr 2024
State Space Model for New-Generation Network Alternative to Transformers: A Survey
Xiao Wang
Shiao Wang
Yuhe Ding
Yuehang Li
Wentao Wu
...
Bowei Jiang
Chenglong Li
Yaowei Wang
Yonghong Tian
Jin Tang
Mamba
33
49
0
15 Apr 2024
Jamba: A Hybrid Transformer-Mamba Language Model
Opher Lieber
Barak Lenz
Hofit Bata
Gal Cohen
Jhonathan Osin
...
Nir Ratner
N. Rozen
Erez Shwartz
Mor Zusman
Y. Shoham
26
208
0
28 Mar 2024
Heracles: A Hybrid SSM-Transformer Model for High-Resolution Image and Time-Series Analysis
Badri N. Patro
Suhas Ranganath
Vinay P. Namboodiri
Vijay Srinivas Agneeswaran
43
2
0
26 Mar 2024
The Hidden Attention of Mamba Models
Ameen Ali
Itamar Zimerman
Lior Wolf
Mamba
39
58
0
03 Mar 2024
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
24
40
0
05 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
97
78
0
01 Feb 2024
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora
Sabri Eyuboglu
Aman Timalsina
Isys Johnson
Michael Poli
James Zou
Atri Rudra
Christopher Ré
64
66
0
08 Dec 2023
1
2
Next