Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.02402
Cited By
Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers
5 October 2021
Narsimha Chilkuri
Eric Hunsberger
Aaron R. Voelker
G. Malik
C. Eliasmith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Modeling using LMUs: 10x Better Data Efficiency or Improved Scaling Compared to Transformers"
8 / 8 papers shown
Title
Parallelizing Legendre Memory Unit Training
Narsimha Chilkuri
C. Eliasmith
35
38
0
22 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
376
2,051
0
31 Dec 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
478
2,051
0
28 Jul 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
188
3,082
0
16 May 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
49
321
0
24 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
85
3,996
0
10 Apr 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
442
4,662
0
23 Jan 2020
Convolutional Self-Attention Networks
Baosong Yang
Longyue Wang
Derek F. Wong
Lidia S. Chao
Zhaopeng Tu
46
124
0
05 Apr 2019
1