Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02182
Cited By
Regularizing and Optimizing LSTM Language Models
7 August 2017
Stephen Merity
N. Keskar
R. Socher
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regularizing and Optimizing LSTM Language Models"
50 / 509 papers shown
Title
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Improved Language Modeling by Decoding the Past
Siddhartha Brahma
BDL
AI4TS
14
6
0
14 Aug 2018
REGMAPR - Text Matching Made Easy
Siddhartha Brahma
VLM
18
1
0
13 Aug 2018
Confidence penalty, annealing Gaussian noise and zoneout for biLSTM-CRF networks for named entity recognition
Antonio Jimeno Yepes
21
2
0
13 Aug 2018
Character-Level Language Modeling with Deeper Self-Attention
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
24
386
0
09 Aug 2018
On Training Recurrent Networks with Truncated Backpropagation Through Time in Speech Recognition
Hao Tang
James R. Glass
30
19
0
09 Jul 2018
DARTS: Differentiable Architecture Search
Hanxiao Liu
Karen Simonyan
Yiming Yang
91
4,304
0
24 Jun 2018
Insights on representational similarity in neural networks with canonical correlation
Ari S. Morcos
M. Raghu
Samy Bengio
DRL
32
430
0
14 Jun 2018
Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models
Minjia Zhang
Xiaodong Liu
Wenhan Wang
Jianfeng Gao
Yuxiong He
23
30
0
11 Jun 2018
Straight to the Tree: Constituency Parsing with Neural Syntactic Distance
Songlin Yang
Zhouhan Lin
Athul Paul Jacob
Alessandro Sordoni
Aaron Courville
Yoshua Bengio
25
91
0
11 Jun 2018
Towards Binary-Valued Gates for Robust LSTM Training
Zhuohan Li
Di He
Fei Tian
Wei-neng Chen
Tao Qin
Liwei Wang
Tie-Yan Liu
MQ
10
47
0
08 Jun 2018
Efficient Full-Matrix Adaptive Regularization
Naman Agarwal
Brian Bullins
Xinyi Chen
Elad Hazan
Karan Singh
Cyril Zhang
Yi Zhang
16
21
0
08 Jun 2018
GamePad: A Learning Environment for Theorem Proving
Daniel Huang
Prafulla Dhariwal
D. Song
Ilya Sutskever
31
109
0
02 Jun 2018
Incremental Natural Language Processing: Challenges, Strategies, and Evaluation
Arne Köhn
CLL
22
11
0
31 May 2018
Sigsoftmax: Reanalysis of the Softmax Bottleneck
Sekitoshi Kanai
Yasuhiro Fujiwara
Yuki Yamanaka
S. Adachi
11
68
0
28 May 2018
Stable Recurrent Models
John Miller
Moritz Hardt
16
116
0
25 May 2018
A Double-Deep Spatio-Angular Learning Framework for Light Field based Face Recognition
Alireza Sepas-Moghaddam
M. A. Haque
P. Correia
Kamal Nasrollahi
T. Moeslund
F. Pereira
CVBM
8
35
0
25 May 2018
Pushing the bounds of dropout
Gábor Melis
Charles Blundell
Tomás Kociský
Karl Moritz Hermann
Chris Dyer
Phil Blunsom
8
13
0
23 May 2018
Breaking the Activation Function Bottleneck through Adaptive Parameterization
Sebastian Flennerhag
Hujun Yin
J. Keane
Mark Elliot
19
12
0
22 May 2018
Improved Sentence Modeling using Suffix Bidirectional LSTM
Siddhartha Brahma
21
24
0
18 May 2018
Learning to Write with Cooperative Discriminators
Ari Holtzman
Jan Buys
Maxwell Forbes
Antoine Bosselut
David Golub
Yejin Choi
31
233
0
16 May 2018
Continuous Learning in a Hierarchical Multiscale Neural Network
Thomas Wolf
Julien Chaumond
Clement Delangue
CLL
AI4CE
NoLa
BDL
11
6
0
15 May 2018
Building Language Models for Text with Named Entities
Md. Rizwan Parvez
Saikat Chakraborty
Baishakhi Ray
Kai-Wei Chang
21
41
0
13 May 2018
Born Again Neural Networks
Tommaso Furlanello
Zachary Chase Lipton
Michael Tschannen
Laurent Itti
Anima Anandkumar
36
1,020
0
12 May 2018
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Urvashi Khandelwal
He He
Peng Qi
Dan Jurafsky
RALM
16
293
0
12 May 2018
State Gradients for RNN Memory Analysis
Lyan Verwimp
Hugo Van hamme
Vincent Renkens
P. Wambacq
11
6
0
11 May 2018
Noisin: Unbiased Regularization for Recurrent Neural Networks
Adji Bousso Dieng
Rajesh Ranganath
Jaan Altosaar
David M. Blei
22
22
0
03 May 2018
Assessing Language Models with Scaling Properties
Shuntaro Takahashi
Kumiko Tanaka-Ishii
ELM
LRM
14
2
0
24 Apr 2018
Dropping Networks for Transfer Learning
J. Ó. Neill
Danushka Bollegala
13
1
0
23 Apr 2018
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
Sabrina J. Mielke
Jason Eisner
LRM
BDL
25
33
0
23 Apr 2018
Training DNNs with Hybrid Block Floating Point
M. Drumond
Tao R. Lin
Martin Jaggi
Babak Falsafi
25
94
0
04 Apr 2018
Aggregated Momentum: Stability Through Passive Damping
James Lucas
Shengyang Sun
R. Zemel
Roger C. Grosse
21
67
0
01 Apr 2018
Meta-Learning a Dynamical Language Model
Thomas Wolf
Julien Chaumond
Clement Delangue
24
4
0
28 Mar 2018
An Analysis of Neural Language Modeling at Multiple Scales
Stephen Merity
N. Keskar
R. Socher
24
170
0
22 Mar 2018
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Yeming Wen
Paul Vicol
Jimmy Ba
Dustin Tran
Roger C. Grosse
BDL
22
307
0
12 Mar 2018
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
DRL
42
4,724
0
04 Mar 2018
Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning
Yichi Zhang
Zhijian Ou
22
0
0
01 Mar 2018
Memory-based Parameter Adaptation
Pablo Sprechmann
Siddhant M. Jayakumar
Jack W. Rae
Alexander Pritzel
Adria Puigdomenech Badia
Benigno Uria
Oriol Vinyals
Demis Hassabis
Razvan Pascanu
Charles Blundell
ODL
OOD
VLM
13
101
0
28 Feb 2018
Reusing Weights in Subword-aware Neural Language Models
Z. Assylbekov
Rustem Takhanov
25
4
0
23 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
D. Song
86
1,114
0
22 Feb 2018
Deep contextualized word representations
Matthew E. Peters
Mark Neumann
Mohit Iyyer
Matt Gardner
Christopher Clark
Kenton Lee
Luke Zettlemoyer
NAI
14
11,481
0
15 Feb 2018
Neural Voice Cloning with a Few Samples
Sercan Ö. Arik
Jitong Chen
Kainan Peng
Ming-Yu Liu
Yanqi Zhou
19
382
0
14 Feb 2018
Efficient Neural Architecture Search via Parameter Sharing
Hieu H. Pham
M. Guan
Barret Zoph
Quoc V. Le
J. Dean
26
2,746
0
09 Feb 2018
Universal Language Model Fine-tuning for Text Classification
Jeremy Howard
Sebastian Ruder
VLM
26
274
0
18 Jan 2018
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
35
101
0
14 Jan 2018
Character-level Recurrent Neural Networks in Practice: Comparing Training and Sampling Schemes
Cedric De Boom
Thomas Demeester
Bart Dhoedt
10
8
0
02 Jan 2018
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
41
521
0
20 Dec 2017
A Flexible Approach to Automated RNN Architecture Generation
Martin Schrimpf
Stephen Merity
James Bradbury
R. Socher
21
15
0
20 Dec 2017
Characterizing the hyper-parameter space of LSTM language models for mixed context applications
Victor Akinwande
S. Remy
21
1
0
08 Dec 2017
Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
Zhilin Yang
Zihang Dai
Ruslan Salakhutdinov
William W. Cohen
BDL
23
365
0
10 Nov 2017
Previous
1
2
3
...
10
11
9
Next