Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02182
Cited By
Regularizing and Optimizing LSTM Language Models
7 August 2017
Stephen Merity
N. Keskar
R. Socher
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Regularizing and Optimizing LSTM Language Models"
50 / 509 papers shown
Title
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
55
132
0
23 May 2023
Multi-Head State Space Model for Speech Recognition
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
34
13
0
21 May 2023
Extending Memory for Language Modelling
A. Nugaliyadde
KELM
CLL
VLM
11
0
0
19 May 2023
Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
Benedikt Lutke Schwienhorst
Lucas Kock
David J. Nott
Nadja Klein
24
1
0
11 May 2023
Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models
Benedetta Cevoli
C. Watkins
Yang Gao
K. Rastle
29
3
0
26 Apr 2023
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
Paul Vicol
Zico Kolter
Kevin Swersky
21
6
0
21 Apr 2023
Efficient Real Time Recurrent Learning through combined activity and parameter sparsity
Anand Subramoney
30
2
0
10 Mar 2023
Variance-reduced Clipping for Non-convex Optimization
Amirhossein Reisizadeh
Haochuan Li
Subhro Das
Ali Jadbabaie
25
26
0
02 Mar 2023
Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage
Iksoo Choi
Wonyong Sung
MLAU
11
0
0
17 Feb 2023
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data
M. Crawshaw
Yajie Bao
Mingrui Liu
FedML
27
8
0
14 Feb 2023
Coordinating Distributed Example Orders for Provably Accelerated Training
A. Feder Cooper
Wentao Guo
Khiem Pham
Tiancheng Yuan
Charlie F. Ruan
Yucheng Lu
Chris De Sa
38
6
0
02 Feb 2023
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks
Nelly Elsayed
Zag ElSayed
Anthony Maida
32
0
0
12 Jan 2023
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
21
0
07 Jan 2023
Preventing RNN from Using Sequence Length as a Feature
Jean-Thomas Baillargeon
Hélène Cossette
Luc Lamontagne
23
1
0
16 Dec 2022
State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions
Cheng Wang
Carolin (Haas) Lawrence
Mathias Niepert
21
3
0
10 Dec 2022
Proceedings of the 4th International Workshop on Reading Music Systems
Jorge Calvo-Zaragoza
Alexander Pacha
Elona Shatri
27
0
0
23 Nov 2022
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts
H. Shashirekha
F. Balouchzahi
M. D. Anusha
G. Sidorov
16
15
0
17 Nov 2022
Circling Back to Recurrent Models of Language
Gábor Melis
40
0
0
03 Nov 2022
Generative Adversarial Training Can Improve Neural Language Models
Sajad Movahedi
A. Shakery
GAN
AI4CE
34
2
0
02 Nov 2022
User-Entity Differential Privacy in Learning Natural Language Models
Phung Lai
Nhathai Phan
Tong Sun
R. Jain
Franck Dernoncourt
Jiuxiang Gu
Nikolaos Barmpalios
FedML
33
0
0
01 Nov 2022
Leveraging Pre-trained Models for Failure Analysis Triplets Generation
Kenneth Ezukwoke
Anis Hoayek
M. Batton-Hubert
Xavier Boucher
Pascal Gounet
Jerome Adrian
35
1
0
31 Oct 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models
K. Armeni
C. Honey
Tal Linzen
KELM
RALM
33
3
0
24 Oct 2022
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Evan Crothers
Nathalie Japkowicz
H. Viktor
DeLMO
50
107
0
13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang
Ruixuan Luo
Qi Su
Xueting Sun
29
11
0
13 Oct 2022
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation
Arturo Oncevay
Kervy Rivas Rojas
Liz Karen Chavez Sanchez
Roberto Zariquiey
27
0
0
05 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
Breaking Time Invariance: Assorted-Time Normalization for RNNs
Cole Pospisil
Vasily Zadorozhnyy
Qiang Ye
21
0
0
28 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
Gábor Melis
MoMe
36
1
0
26 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
224
1,311
0
02 Sep 2022
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
36
66
0
23 Aug 2022
A Syntax Aware BERT for Identifying Well-Formed Queries in a Curriculum Framework
Avinash Madasu
Anvesh Rao Vijjini
27
0
0
21 Aug 2022
Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation
Edison Mucllari
Vasily Zadorozhnyy
Cole Pospisil
D. Nguyen
Qiang Ye
41
3
0
12 Aug 2022
Model Blending for Text Classification
Ramit Pahwa
18
0
0
05 Aug 2022
Interacting with next-phrase suggestions: How suggestion systems aid and influence the cognitive processes of writing
Advait Bhat
Saaket Agashe
Niharika Mohile
Parth Oberoi
R. Jangir
Anirudha N. Joshi
30
37
0
01 Aug 2022
Explainable and High-Performance Hate and Offensive Speech Detection
M. Babaeianjelodar
Gurram Poorna Prudhvi
Stephen Lorenz
Keyu Chen
Sumona Mondal
Soumyabrata Dey
Navin Kumar
6
2
0
26 Jun 2022
Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization
Deokjae Lee
Seungyong Moon
Junhyeok Lee
Hyun Oh Song
AAML
28
38
0
17 Jun 2022
Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network
Xue-yuan Guo
Bo Zhou
D. Pigg
Bruce Spottiswoode
M. Casey
Chi Liu
Nicha Dvornek
MedIm
32
16
0
13 Jun 2022
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
Anand Subramoney
Khaleelulla Khan Nazeer
Mark Schöne
Christian Mayr
David Kappel
35
16
0
13 Jun 2022
Latent Diffusion Energy-Based Model for Interpretable Text Modeling
Peiyu Yu
Sirui Xie
Xiaojian Ma
Baoxiong Jia
Bo Pang
Ruigi Gao
Yixin Zhu
Song-Chun Zhu
Ying Nian Wu
DiffM
40
81
0
13 Jun 2022
ByteComp: Revisiting Gradient Compression in Distributed Training
Zhuang Wang
Yanghua Peng
Yibo Zhu
T. Ng
13
2
0
28 May 2022
History Compression via Language Models in Reinforcement Learning
Fabian Paischer
Thomas Adler
Vihang Patil
Angela Bitto-Nemling
Markus Holzleitner
Sebastian Lehner
Hamid Eghbalzadeh
Sepp Hochreiter
OffRL
AI4TS
28
42
0
24 May 2022
GraB: Finding Provably Better Data Permutations than Random Reshuffling
Yucheng Lu
Wentao Guo
Christopher De Sa
FedML
26
16
0
22 May 2022
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
Mingrui Liu
Zhenxun Zhuang
Yunwei Lei
Chunyang Liao
38
16
0
10 May 2022
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs
Aaron Courville
Wei Liu
Kewei Tu
21
8
0
01 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Elastic Model Aggregation with Parameter Service
Juncheng Gu
Mosharaf Chowdhury
Kang G. Shin
Aditya Akella
11
3
0
07 Apr 2022
A Survey on Dropout Methods and Experimental Verification in Recommendation
Yong Li
Weizhi Ma
C. L. Philip Chen
Hao Fei
Yiqun Liu
Shaoping Ma
Yue Yang
33
9
0
05 Apr 2022
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
Jiannan Xiang
Huayang Li
Defu Lian
Guoping Huang
Taro Watanabe
Lemao Liu
42
0
0
29 Mar 2022
Dependency-based Mixture Language Models
Zhixian Yang
Xiaojun Wan
49
2
0
19 Mar 2022
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference
Zhongzhi Yu
Y. Fu
Shang Wu
Mengquan Li
Haoran You
Yingyan Lin
28
1
0
15 Mar 2022
Previous
1
2
3
4
5
...
9
10
11
Next