Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.07522
Cited By
v1
v2
v3 (latest)
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
11 June 2024
Liliang Ren
Yang Liu
Yadong Lu
Yelong Shen
Chen Liang
Weizhu Chen
Mamba
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
32 / 132 papers shown
Title
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
216
2,004
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
236
5,647
0
07 Jul 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
292
2,521
0
20 Apr 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
97
464
0
18 Apr 2021
Efficient Attentions for Long Document Summarization
L. Huang
Shuyang Cao
Nikolaus Nova Parulian
Heng Ji
Lu Wang
130
289
0
05 Apr 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
124
252
0
22 Feb 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
184
4,572
0
07 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
556
2,099
0
28 Jul 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
880
42,379
0
28 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
179
4,092
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
335
602
0
12 Mar 2020
GLU Variants Improve Transformer
Noam M. Shazeer
138
1,022
0
12 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
142
996
0
12 Feb 2020
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
201
2,332
0
13 Jan 2020
PIQA: Reasoning about Physical Commonsense in Natural Language
Yonatan Bisk
Rowan Zellers
Ronan Le Bras
Jianfeng Gao
Yejin Choi
OOD
LRM
175
1,839
0
26 Nov 2019
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
161
475
0
06 Nov 2019
Depth-wise Decomposition for Accelerating Separable Convolutions in Efficient Convolutional Neural Networks
Yihui He
Jianing Qian
Jianren Wang
Cindy X. Le
Congrui Hetang
Qi Lyu
Wenping Wang
Tianwei Yue
82
11
0
21 Oct 2019
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
99
759
0
16 Oct 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
244
1,551
0
24 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
182
2,523
0
19 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
129
1,916
0
23 Apr 2019
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
199
3,210
0
22 Apr 2019
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
119
1,562
0
08 Sep 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
170
2,660
0
14 Mar 2018
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
76
103
0
12 Sep 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
Stefan Elfwing
E. Uchibe
Kenji Doya
136
1,739
0
10 Feb 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
242
2,404
0
23 Dec 2016
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno
Germán Kruszewski
Angeliki Lazaridou
Q. N. Pham
Raffaella Bernardi
Sandro Pezzelle
Marco Baroni
Gemma Boleda
Raquel Fernández
142
727
0
20 Jun 2016
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
316
8,174
0
16 Jun 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
578
27,327
0
01 Sep 2014
Previous
1
2
3