Regularizing and Optimizing LSTM Language Models

7 August 2017

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown

Title
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training Hong Liu Zhiyuan Li David Leo Wright Hall Percy Liang Tengyu Ma VLM 55 132 0 23 May 2023
Multi-Head State Space Model for Speech Recognition Yassir Fathullah Chunyang Wu Yuan Shangguan Junteng Jia Wenhan Xiong ... Chunxi Liu Yangyang Shi Ozlem Kalinli M. Seltzer Mark Gales 34 13 0 21 May 2023
Extending Memory for Language Modelling A. Nugaliyadde KELM CLL VLM 11 0 0 19 May 2023
Dropout Regularization in Extended Generalized Linear Models based on Double Exponential Families Benedikt Lutke Schwienhorst Lucas Kock David J. Nott Nadja Klein 24 1 0 11 May 2023
Shades of meaning: Uncovering the geometry of ambiguous word representations through contextualised language models Benedetta Cevoli C. Watkins Yang Gao K. Rastle 29 3 0 26 Apr 2023
Low-Variance Gradient Estimation in Unrolled Computation Graphs with ES-Single Paul Vicol Zico Kolter Kevin Swersky 21 6 0 21 Apr 2023
Efficient Real Time Recurrent Learning through combined activity and parameter sparsity Anand Subramoney 30 2 0 10 Mar 2023
Variance-reduced Clipping for Non-convex Optimization Amirhossein Reisizadeh Haochuan Li Subhro Das Ali Jadbabaie 25 26 0 02 Mar 2023
Sleep Model -- A Sequence Model for Predicting the Next Sleep Stage Iksoo Choi Wonyong Sung MLAU 11 0 0 17 Feb 2023
EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous Data M. Crawshaw Yajie Bao Mingrui Liu FedML 27 8 0 14 Feb 2023
Coordinating Distributed Example Orders for Provably Accelerated Training A. Feder Cooper Wentao Guo Khiem Pham Tiancheng Yuan Charlie F. Ruan Yucheng Lu Chris De Sa 38 6 0 02 Feb 2023
LiteLSTM Architecture Based on Weights Sharing for Recurrent Neural Networks Nelly Elsayed Zag ElSayed Anthony Maida 32 0 0 12 Jan 2023
Why do Nearest Neighbor Language Models Work? Frank F. Xu Uri Alon Graham Neubig RALM 30 21 0 07 Jan 2023
Preventing RNN from Using Sequence Length as a Feature Jean-Thomas Baillargeon Hélène Cossette Luc Lamontagne 23 1 0 16 Dec 2022
State-Regularized Recurrent Neural Networks to Extract Automata and Explain Predictions Cheng Wang Carolin (Haas) Lawrence Mathias Niepert 21 3 0 10 Dec 2022
Proceedings of the 4th International Workshop on Reading Music Systems Jorge Calvo-Zaragoza Alexander Pacha Elona Shatri 27 0 0 23 Nov 2022
CoLI-Machine Learning Approaches for Code-mixed Language Identification at the Word Level in Kannada-English Texts H. Shashirekha F. Balouchzahi M. D. Anusha G. Sidorov 16 15 0 17 Nov 2022
Circling Back to Recurrent Models of Language Gábor Melis 40 0 0 03 Nov 2022
Generative Adversarial Training Can Improve Neural Language Models Sajad Movahedi A. Shakery GAN AI4CE 34 2 0 02 Nov 2022
User-Entity Differential Privacy in Learning Natural Language Models Phung Lai Nhathai Phan Tong Sun R. Jain Franck Dernoncourt Jiuxiang Gu Nikolaos Barmpalios FedML 33 0 0 01 Nov 2022
Leveraging Pre-trained Models for Failure Analysis Triplets Generation Kenneth Ezukwoke Anis Hoayek M. Batton-Hubert Xavier Boucher Pascal Gounet Jerome Adrian 35 1 0 31 Oct 2022
Characterizing Verbatim Short-Term Memory in Neural Language Models K. Armeni C. Honey Tal Linzen KELM RALM 33 3 0 24 Oct 2022
Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods Evan Crothers Nathalie Japkowicz H. Viktor DeLMO 50 107 0 13 Oct 2022
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization Zhiyuan Zhang Ruixuan Luo Qi Su Xueting Sun 29 11 0 13 Oct 2022
Revisiting Syllables in Language Modelling and their Application on Low-Resource Machine Translation Arturo Oncevay Kervy Rivas Rojas Liz Karen Chavez Sanchez Roberto Zariquiey 27 0 0 05 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging Jean Kaddour MoMe 3DH 24 39 0 29 Sep 2022
Breaking Time Invariance: Assorted-Time Normalization for RNNs Cole Pospisil Vasily Zadorozhnyy Qiang Ye 21 0 0 28 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization Gábor Melis MoMe 36 1 0 26 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications Ling Yang Zhilong Zhang Yingxia Shao Shenda Hong Runsheng Xu Yue Zhao Wentao Zhang Bin Cui Ming-Hsuan Yang DiffM MedIm 224 1,311 0 02 Sep 2022
Robustness to Unbounded Smoothness of Generalized SignSGD M. Crawshaw Mingrui Liu Francesco Orabona Wei Zhang Zhenxun Zhuang AAML 36 66 0 23 Aug 2022
A Syntax Aware BERT for Identifying Well-Formed Queries in a Curriculum Framework Avinash Madasu Anvesh Rao Vijjini 27 0 0 21 Aug 2022
Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation Edison Mucllari Vasily Zadorozhnyy Cole Pospisil D. Nguyen Qiang Ye 41 3 0 12 Aug 2022
Model Blending for Text Classification Ramit Pahwa 18 0 0 05 Aug 2022
Interacting with next-phrase suggestions: How suggestion systems aid and influence the cognitive processes of writing Advait Bhat Saaket Agashe Niharika Mohile Parth Oberoi R. Jangir Anirudha N. Joshi 30 37 0 01 Aug 2022
Explainable and High-Performance Hate and Offensive Speech Detection M. Babaeianjelodar Gurram Poorna Prudhvi Stephen Lorenz Keyu Chen Sumona Mondal Soumyabrata Dey Navin Kumar 6 2 0 26 Jun 2022
Query-Efficient and Scalable Black-Box Adversarial Attacks on Discrete Sequential Data via Bayesian Optimization Deokjae Lee Seungyong Moon Junhyeok Lee Hyun Oh Song AAML 28 38 0 17 Jun 2022
Unsupervised inter-frame motion correction for whole-body dynamic PET using convolutional long short-term memory in a convolutional neural network Xue-yuan Guo Bo Zhou D. Pigg Bruce Spottiswoode M. Casey Chi Liu Nicha Dvornek MedIm 32 16 0 13 Jun 2022
Efficient recurrent architectures through activity sparsity and sparse back-propagation through time Anand Subramoney Khaleelulla Khan Nazeer Mark Schöne Christian Mayr David Kappel 35 16 0 13 Jun 2022
Latent Diffusion Energy-Based Model for Interpretable Text Modeling Peiyu Yu Sirui Xie Xiaojian Ma Baoxiong Jia Bo Pang Ruigi Gao Yixin Zhu Song-Chun Zhu Ying Nian Wu DiffM 40 81 0 13 Jun 2022
ByteComp: Revisiting Gradient Compression in Distributed Training Zhuang Wang Yanghua Peng Yibo Zhu T. Ng 13 2 0 28 May 2022
History Compression via Language Models in Reinforcement Learning Fabian Paischer Thomas Adler Vihang Patil Angela Bitto-Nemling Markus Holzleitner Sebastian Lehner Hamid Eghbalzadeh Sepp Hochreiter OffRL AI4TS 28 42 0 24 May 2022
GraB: Finding Provably Better Data Permutations than Random Reshuffling Yucheng Lu Wentao Guo Christopher De Sa FedML 26 16 0 22 May 2022
A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks Mingrui Liu Zhenxun Zhuang Yunwei Lei Chunyang Liao 38 16 0 10 May 2022
Dynamic Programming in Rank Space: Scaling Structured Inference with Low-Rank HMMs and PCFGs Aaron Courville Wei Liu Kewei Tu 21 8 0 01 May 2022
A Call for Clarity in Beam Search: How It Works and When It Stops Jungo Kasai Keisuke Sakaguchi Ronan Le Bras Dragomir R. Radev Yejin Choi Noah A. Smith 26 6 0 11 Apr 2022
Elastic Model Aggregation with Parameter Service Juncheng Gu Mosharaf Chowdhury Kang G. Shin Aditya Akella 11 3 0 07 Apr 2022
A Survey on Dropout Methods and Experimental Verification in Recommendation Yong Li Weizhi Ma C. L. Philip Chen Hao Fei Yiqun Liu Shaoping Ma Yue Yang 33 9 0 05 Apr 2022
Visualizing the Relationship Between Encoded Linguistic Information and Task Performance Jiannan Xiang Huayang Li Defu Lian Guoping Huang Taro Watanabe Lemao Liu 42 0 0 29 Mar 2022
Dependency-based Mixture Language Models Zhixian Yang Xiaojun Wan 49 2 0 19 Mar 2022
LDP: Learnable Dynamic Precision for Efficient Deep Neural Network Training and Inference Zhongzhi Yu Y. Fu Shang Wu Mengquan Li Haoran You Yingyan Lin 28 1 0 15 Mar 2022