Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.03474
Cited By
Revisiting Simple Neural Probabilistic Language Models
8 April 2021
Simeng Sun
Mohit Iyyer
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revisiting Simple Neural Probabilistic Language Models"
18 / 18 papers shown
Title
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
271
90
0
31 Dec 2020
Scaling Hidden Markov Language Models
Justin T. Chiu
Alexander M. Rush
BDL
104
25
0
09 Nov 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
546
2,086
0
28 Jul 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
174
4,071
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
314
597
0
12 Mar 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
608
4,822
0
23 Jan 2020
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
62
87
0
10 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
166
837
0
01 Nov 2019
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
76
285
0
19 May 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
250
3,730
0
09 Jan 2019
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
104
390
0
28 Sep 2018
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Urvashi Khandelwal
He He
Peng Qi
Dan Jurafsky
RALM
55
296
0
12 May 2018
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
55
170
0
22 Mar 2018
Regularizing and Optimizing LSTM Language Models
Stephen Merity
N. Keskar
R. Socher
166
1,096
0
07 Aug 2017
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
328
2,876
0
26 Sep 2016
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
77
734
0
20 Aug 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
Recurrent Neural Network Regularization
Wojciech Zaremba
Ilya Sutskever
Oriol Vinyals
ODL
146
2,776
0
08 Sep 2014
1