Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.11687
Cited By
v1
v2 (latest)
State-space models can learn in-context by gradient descent
15 October 2024
Neeraj Mohan Sushma
Yudou Tian
Harshvardhan Mestha
Nicolo Colombo
David Kappel
Anand Subramoney
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"State-space models can learn in-context by gradient descent"
25 / 25 papers shown
Title
From Markov to Laplace: How Mamba In-Context Learns Markov Chains
Marco Bondaschi
Nived Rajaraman
Xiuying Wei
Kannan Ramchandran
Razvan Pascanu
Çağlar Gülçehre
Michael C. Gastpar
Ashok Vardhan Makkuva
159
0
0
17 Feb 2025
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
119
14
0
19 Jul 2024
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
Yu Sun
Xinhao Li
Karan Dalal
Jiarui Xu
Arjun Vikram
...
Xinlei Chen
Xiaolong Wang
Sanmi Koyejo
Tatsunori Hashimoto
Carlos Guestrin
145
113
0
05 Jul 2024
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
182
69
0
11 Jun 2024
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao
Albert Gu
Mamba
178
541
0
31 May 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models
Soham De
Samuel L. Smith
Anushan Fernando
Aleksandar Botev
George-Christian Muraru
...
David Budden
Yee Whye Teh
Razvan Pascanu
Nando de Freitas
Çağlar Gülçehre
Mamba
130
135
0
29 Feb 2024
Is Mamba Capable of In-Context Learning?
Riccardo Grazzi
Julien N. Siems
Simon Schrodi
Thomas Brox
Frank Hutter
96
41
0
05 Feb 2024
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
168
2,839
0
01 Dec 2023
In-Context Learning Creates Task Vectors
Roee Hendel
Mor Geva
Amir Globerson
110
168
0
24 Oct 2023
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo
Wei Hu
Song Mei
Huan Wang
Caiming Xiong
Silvio Savarese
Yu Bai
107
60
0
16 Oct 2023
Are Emergent Abilities in Large Language Models just In-Context Learning?
Sheng Lu
Irina Bigoulaeva
Rachneet Sachdeva
Harish Tayyar Madabushi
Iryna Gurevych
LRM
ELM
ReLM
150
100
0
04 Sep 2023
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning
Jane Pan
Tianyu Gao
Howard Chen
Danqi Chen
84
128
0
16 May 2023
The Learnability of In-Context Learning
Noam Wies
Yoav Levine
Amnon Shashua
186
109
0
14 Mar 2023
Resurrecting Recurrent Neural Networks for Long Sequences
Antonio Orvieto
Samuel L. Smith
Albert Gu
Anushan Fernando
Çağlar Gülçehre
Razvan Pascanu
Soham De
341
298
0
11 Mar 2023
Larger language models do in-context learning differently
Jerry W. Wei
Jason W. Wei
Yi Tay
Dustin Tran
Albert Webson
...
Xinyun Chen
Hanxiao Liu
Da Huang
Denny Zhou
Tengyu Ma
ReLM
LRM
125
374
0
07 Mar 2023
Hyena Hierarchy: Towards Larger Convolutional Language Models
Michael Poli
Stefano Massaroli
Eric Q. Nguyen
Daniel Y. Fu
Tri Dao
S. Baccus
Yoshua Bengio
Stefano Ermon
Christopher Ré
VLM
174
314
0
21 Feb 2023
Transformers learn in-context by gradient descent
J. Oswald
Eyvind Niklasson
E. Randazzo
João Sacramento
A. Mordvintsev
A. Zhmoginov
Max Vladymyrov
MLT
148
497
0
15 Dec 2022
What learning algorithm is in-context learning? Investigations with linear models
Ekin Akyürek
Dale Schuurmans
Jacob Andreas
Tengyu Ma
Denny Zhou
125
493
0
28 Nov 2022
What Can Transformers Learn In-Context? A Case Study of Simple Function Classes
Shivam Garg
Dimitris Tsipras
Percy Liang
Gregory Valiant
160
514
0
01 Aug 2022
Efficiently Modeling Long Sequences with Structured State Spaces
Albert Gu
Karan Goel
Christopher Ré
275
1,855
0
31 Oct 2021
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
218
1,800
0
29 Jun 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
230
4,109
0
10 Apr 2020
Weighted Sigmoid Gate Unit for an Activation Function of Deep Neural Network
Masayuki Tanaka
63
56
0
03 Oct 2018
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
976
133,429
0
12 Jun 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
973
12,011
0
09 Mar 2017
1