ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown
Title
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model
  Pre-training
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training
Hong Liu
Zhiyuan Li
David Leo Wright Hall
Percy Liang
Tengyu Ma
VLM
55
132
0
23 May 2023
Multi-Head State Space Model for Speech Recognition
Multi-Head State Space Model for Speech Recognition
Yassir Fathullah
Chunyang Wu
Yuan Shangguan
Junteng Jia
Wenhan Xiong
...
Chunxi Liu
Yangyang Shi
Ozlem Kalinli
M. Seltzer
Mark Gales
34
13
0
21 May 2023
Extending Memory for Language Modelling
Extending Memory for Language Modelling
A. Nugaliyadde
KELM
CLL
VLM
11
0
0
19 May 2023
Dropout Regularization in Extended Generalized Linear Models based on
  Double Exponential Families
Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families
Benedikt Lutke Schwienhorst
Lucas Kock
David J. Nott
Nadja Klein
24
1
0
11 May 2023
Shades of meaning: Uncovering the geometry of ambiguous word
  representations through contextualised language models
Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models
Benedetta Cevoli
C. Watkins
Yang Gao
K. Rastle
29
3
0
26 Apr 2023
Low-Variance Gradient Estimation in Unrolled Computation Graphs with
  ES-Single
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single
Paul Vicol
Zico Kolter
Kevin Swersky
21
6
0
21 Apr 2023
Efficient Real Time Recurrent Learning through combined activity and
  parameter sparsity
Efficient Real Time Recurrent Learning through combined activity and parameter sparsity
Anand Subramoney
30
2
0
10 Mar 2023
Variance-reduced Clipping for Non-convex Optimization
Variance-reduced Clipping for Non-convex Optimization
Amirhossein Reisizadeh
Haochuan Li
Subhro Das
Ali Jadbabaie
25
26
0
02 Mar 2023
Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage
Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage
Iksoo Choi
Wonyong Sung
MLAU
11
0
0
17 Feb 2023
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections
  for Federated Learning with Heterogeneous Data
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data
M. Crawshaw
Yajie Bao
Mingrui Liu
FedML
27
8
0
14 Feb 2023
Coordinating Distributed Example Orders for Provably Accelerated
  Training
Coordinating Distributed Example Orders for Provably Accelerated Training
A. Feder Cooper
Wentao Guo
Khiem Pham
Tiancheng Yuan
Charlie F. Ruan
Yucheng Lu
Chris De Sa
38
6
0
02 Feb 2023
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural
  Networks
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks
Nelly Elsayed
Zag ElSayed
Anthony Maida
32
0
0
12 Jan 2023
Why do Nearest Neighbor Language Models Work?
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
30
21
0
07 Jan 2023
Preventing RNN from Using Sequence Length as a Feature
Preventing RNN from Using Sequence Length as a Feature
Jean-Thomas Baillargeon
Hélène Cossette
Luc Lamontagne
23
1
0
16 Dec 2022
State-Regularized Recurrent Neural Networks to Extract Automata and
  Explain Predictions
State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions
Cheng Wang
Carolin (Haas) Lawrence
Mathias Niepert
21
3
0
10 Dec 2022
Proceedings of the 4th International Workshop on Reading Music Systems
Proceedings of the 4th International Workshop on Reading Music Systems
Jorge Calvo-Zaragoza
Alexander Pacha
Elona Shatri
27
0
0
23 Nov 2022
CoLI-Machine Learning Approaches for Code-mixed Language Identification
  at the Word Level in Kannada-English Texts
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts
H. Shashirekha
F. Balouchzahi
M. D. Anusha
G. Sidorov
16
15
0
17 Nov 2022
Circling Back to Recurrent Models of Language
Circling Back to Recurrent Models of Language
Gábor Melis
40
0
0
03 Nov 2022
Generative Adversarial Training Can Improve Neural Language Models
Generative Adversarial Training Can Improve Neural Language Models
Sajad Movahedi
A. Shakery
GAN
AI4CE
34
2
0
02 Nov 2022
User-Entity Differential Privacy in Learning Natural Language Models
User-Entity Differential Privacy in Learning Natural Language Models
Phung Lai
Nhathai Phan
Tong Sun
R. Jain
Franck Dernoncourt
Jiuxiang Gu
Nikolaos Barmpalios
FedML
33
0
0
01 Nov 2022
Leveraging Pre-trained Models for Failure Analysis Triplets Generation
Leveraging Pre-trained Models for Failure Analysis Triplets Generation
Kenneth Ezukwoke
Anis Hoayek
M. Batton-Hubert
Xavier Boucher
Pascal Gounet
Jerome Adrian
35
1
0
31 Oct 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models
Characterizing Verbatim Short-Term Memory in Neural Language Models
K. Armeni
C. Honey
Tal Linzen
KELM
RALM
33
3
0
24 Oct 2022
Machine Generated Text: A Comprehensive Survey of Threat Models and
  Detection Methods
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods
Evan Crothers
Nathalie Japkowicz
H. Viktor
DeLMO
50
107
0
13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization
  for Improved Generalization
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang
Ruixuan Luo
Qi Su
Xueting Sun
29
11
0
13 Oct 2022
Revisiting Syllables in Language Modelling and their Application on
  Low-Resource Machine Translation
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation
Arturo Oncevay
Kervy Rivas Rojas
Liz Karen Chavez Sanchez
Roberto Zariquiey
27
0
0
05 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with
  Latest Weight Averaging
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging
Jean Kaddour
MoMe
3DH
24
39
0
29 Sep 2022
Breaking Time Invariance: Assorted-Time Normalization for RNNs
Breaking Time Invariance: Assorted-Time Normalization for RNNs
Cole Pospisil
Vasily Zadorozhnyy
Qiang Ye
21
0
0
28 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight
  Averaging for Better Generalization
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization
Gábor Melis
MoMe
36
1
0
26 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
224
1,311
0
02 Sep 2022
Robustness to Unbounded Smoothness of Generalized SignSGD
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
36
66
0
23 Aug 2022
A Syntax Aware BERT for Identifying Well-Formed Queries in a Curriculum
  Framework
A Syntax Aware BERT for Identifying Well-Formed Queries in a Curriculum Framework
Avinash Madasu
Anvesh Rao Vijjini
27
0
0
21 Aug 2022
Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation
Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation
Edison Mucllari
Vasily Zadorozhnyy
Cole Pospisil
D. Nguyen
Qiang Ye
41
3
0
12 Aug 2022
Model Blending for Text Classification
Model Blending for Text Classification
Ramit Pahwa
18
0
0
05 Aug 2022
Interacting with next-phrase suggestions: How suggestion systems aid and
  influence the cognitive processes of writing
Interacting with next-phrase suggestions: How suggestion systems aid and influence the cognitive processes of writing
Advait Bhat
Saaket Agashe
Niharika Mohile
Parth Oberoi
R. Jangir
Anirudha N. Joshi
30
37
0
01 Aug 2022
Explainable and High-Performance Hate and Offensive Speech Detection
Explainable and High-Performance Hate and Offensive Speech Detection
M. Babaeianjelodar
Gurram Poorna Prudhvi
Stephen Lorenz
Keyu Chen
Sumona Mondal
Soumyabrata Dey
Navin Kumar
6
2
0
26 Jun 2022
Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete
  Sequential Data via Bayesian Optimization
Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization
Deokjae Lee
Seungyong Moon
Junhyeok Lee
Hyun Oh Song
AAML
28
38
0
17 Jun 2022
Unsupervised inter-frame motion correction for whole-body dynamic PET
  using convolutional long short-term memory in a convolutional neural network
Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network
Xue-yuan Guo
Bo Zhou
D. Pigg
Bruce Spottiswoode
M. Casey
Chi Liu
Nicha Dvornek
MedIm
32
16
0
13 Jun 2022
Efficient recurrent architectures through activity sparsity and sparse
  back-propagation through time
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time
Anand Subramoney
Khaleelulla Khan Nazeer
Mark Schöne
Christian Mayr
David Kappel
35
16
0
13 Jun 2022
Latent Diffusion Energy-Based Model for Interpretable Text Modeling
Latent Diffusion Energy-Based Model for Interpretable Text Modeling
Peiyu Yu
Sirui Xie
Xiaojian Ma
Baoxiong Jia
Bo Pang
Ruigi Gao
Yixin Zhu
Song-Chun Zhu
Ying Nian Wu
DiffM
40
81
0
13 Jun 2022
ByteComp: Revisiting Gradient Compression in Distributed Training
ByteComp: Revisiting Gradient Compression in Distributed Training
Zhuang Wang
Yanghua Peng
Yibo Zhu
T. Ng
13
2
0
28 May 2022
History Compression via Language Models in Reinforcement Learning
History Compression via Language Models in Reinforcement Learning
Fabian Paischer
Thomas Adler
Vihang Patil
Angela Bitto-Nemling
Markus Holzleitner
Sebastian Lehner
Hamid Eghbalzadeh
Sepp Hochreiter
OffRL
AI4TS
28
42
0
24 May 2022
GraB: Finding Provably Better Data Permutations than Random Reshuffling
GraB: Finding Provably Better Data Permutations than Random Reshuffling
Yucheng Lu
Wentao Guo
Christopher De Sa
FedML
26
16
0
22 May 2022
A Communication-Efficient Distributed Gradient Clipping Algorithm for
  Training Deep Neural Networks
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks
Mingrui Liu
Zhenxun Zhuang
Yunwei Lei
Chunyang Liao
38
16
0
10 May 2022
Dynamic Programming in Rank Space: Scaling Structured Inference with
  Low-Rank HMMs and PCFGs
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs
Aaron Courville
Wei Liu
Kewei Tu
21
8
0
01 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops
A Call for Clarity in Beam Search: How It Works and When It Stops
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Dragomir R. Radev
Yejin Choi
Noah A. Smith
26
6
0
11 Apr 2022
Elastic Model Aggregation with Parameter Service
Elastic Model Aggregation with Parameter Service
Juncheng Gu
Mosharaf Chowdhury
Kang G. Shin
Aditya Akella
11
3
0
07 Apr 2022
A Survey on Dropout Methods and Experimental Verification in
  Recommendation
A Survey on Dropout Methods and Experimental Verification in Recommendation
Yong Li
Weizhi Ma
C. L. Philip Chen
Hao Fei
Yiqun Liu
Shaoping Ma
Yue Yang
33
9
0
05 Apr 2022
Visualizing the Relationship Between Encoded Linguistic Information and
  Task Performance
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance
Jiannan Xiang
Huayang Li
Defu Lian
Guoping Huang
Taro Watanabe
Lemao Liu
42
0
0
29 Mar 2022
Dependency-based Mixture Language Models
Dependency-based Mixture Language Models
Zhixian Yang
Xiaojun Wan
49
2
0
19 Mar 2022
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network
  Training and Inference
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference
Zhongzhi Yu
Y. Fu
Shang Wu
Mengquan Li
Haoran You
Yingyan Lin
28
1
0
15 Mar 2022
Previous
12345...91011
Next