ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown
Title
Syntax-driven Iterative Expansion Language Models for Controllable Text
  Generation
Syntax-driven Iterative Expansion Language Models for Controllable Text Generation
Noe Casas
José A. R. Fonollosa
Marta R. Costa-jussá
19
11
0
05 Apr 2020
Improving Perceptual Quality of Drum Transcription with the Expanded
  Groove MIDI Dataset
Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset
Lee F. Callender
Curtis Hawthorne
Jesse Engel
43
20
0
01 Apr 2020
A Survey of Deep Learning for Scientific Discovery
A Survey of Deep Learning for Scientific Discovery
M. Raghu
Erica Schmidt
OOD
AI4CE
40
120
0
26 Mar 2020
Multi-Class classification of vulnerabilities in Smart Contracts using
  AWD-LSTM, with pre-trained encoder inspired from natural language processing
Multi-Class classification of vulnerabilities in Smart Contracts using AWD-LSTM, with pre-trained encoder inspired from natural language processing
Ajay K. Gogineni
S. Swayamjyoti
Devadatta Sahoo
K. Sahu
R. Kishore
31
32
0
21 Mar 2020
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang
Manzil Zaheer
Yuan Wang
Amr Ahmed
BDL
21
1
0
18 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
10
3
0
02 Mar 2020
Tensor Networks for Probabilistic Sequence Modeling
Tensor Networks for Probabilistic Sequence Modeling
Jacob Miller
Guillaume Rabusseau
John Terilla
16
5
0
02 Mar 2020
The Implicit and Explicit Regularization Effects of Dropout
The Implicit and Explicit Regularization Effects of Dropout
Colin Wei
Sham Kakade
Tengyu Ma
30
114
0
28 Feb 2020
Temporal Convolutional Attention-based Network For Sequence Modeling
Temporal Convolutional Attention-based Network For Sequence Modeling
Hongyan Hao
Yan Wang
Siqiao Xue
Yudi Xia
Jian Zhao
S. Furao
30
41
0
28 Feb 2020
Backpropamine: training self-modifying neural networks with
  differentiable neuromodulated plasticity
Backpropamine: training self-modifying neural networks with differentiable neuromodulated plasticity
Thomas Miconi
Aditya Rawal
Jeff Clune
Kenneth O. Stanley
13
90
0
24 Feb 2020
Addressing Some Limitations of Transformers with Feedback Memory
Addressing Some Limitations of Transformers with Feedback Memory
Angela Fan
Thibaut Lavril
Edouard Grave
Armand Joulin
Sainbayar Sukhbaatar
26
11
0
21 Feb 2020
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
MaxUp: A Simple Way to Improve Generalization of Neural Network Training
Chengyue Gong
Tongzheng Ren
Mao Ye
Qiang Liu
AAML
27
56
0
20 Feb 2020
A Systematic Comparison of Architectures for Document-Level Sentiment
  Classification
A Systematic Comparison of Architectures for Document-Level Sentiment Classification
Jeremy Barnes
Vinit Ravishankar
Lilja Ovrelid
Erik Velldal
8
0
0
19 Feb 2020
SentenceMIM: A Latent Variable Language Model
SentenceMIM: A Latent Variable Language Model
M. Livne
Kevin Swersky
David J. Fleet
VLM
49
6
0
18 Feb 2020
Transformer on a Diet
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng-Wei Zhang
Alex Smola
32
8
0
14 Feb 2020
Deep Learning for Source Code Modeling and Generation: Models,
  Applications and Challenges
Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges
T. H. Le
Hao Chen
Muhammad Ali Babar
VLM
64
152
0
13 Feb 2020
fastai: A Layered API for Deep Learning
fastai: A Layered API for Deep Learning
Jeremy Howard
Sylvain Gugger
AI4CE
20
857
0
11 Feb 2020
Localized Flood DetectionWith Minimal Labeled Social Media Data Using
  Transfer Learning
Localized Flood DetectionWith Minimal Labeled Social Media Data Using Transfer Learning
Neha Singh
Nirmalya Roy
A. Gangopadhyay
27
6
0
10 Feb 2020
Understanding and Improving Knowledge Distillation
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
27
129
0
10 Feb 2020
Blank Language Models
Blank Language Models
T. Shen
Victor Quach
Regina Barzilay
Tommi Jaakkola
203
73
0
08 Feb 2020
Consistency of a Recurrent Language Model With Respect to Incomplete
  Decoding
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding
Sean Welleck
Ilia Kulikov
Jaedeok Kim
Richard Yuanzhe Pang
Kyunghyun Cho
17
65
0
06 Feb 2020
SQWA: Stochastic Quantized Weight Averaging for Improving the
  Generalization Capability of Low-Precision Deep Neural Networks
SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of Low-Precision Deep Neural Networks
Sungho Shin
Yoonho Boo
Wonyong Sung
MQ
27
3
0
02 Feb 2020
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence
  Pre-training
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training
Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
AI4TS
27
446
0
13 Jan 2020
A Continuous Space Neural Language Model for Bengali Language
A Continuous Space Neural Language Model for Bengali Language
Hemayet Ahmed Chowdhury
Md. Azizul Haque Imon
Anisur Rahman
Aisha Khatun
Md. Saiful Islam
19
2
0
11 Jan 2020
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
CProp: Adaptive Learning Rate Scaling from Past Gradient Conformity
Konpat Preechakul
B. Kijsirikul
ODL
30
3
0
24 Dec 2019
Hierarchical Character Embeddings: Learning Phonological and Semantic
  Representations in Languages of Logographic Origin using Recursive Neural
  Networks
Hierarchical Character Embeddings: Learning Phonological and Semantic Representations in Languages of Logographic Origin using Recursive Neural Networks
Minh Nguyen
G. Ngo
Nancy F. Chen
19
19
0
20 Dec 2019
Just Add Functions: A Neural-Symbolic Language Model
Just Add Functions: A Neural-Symbolic Language Model
David Demeter
Doug Downey
8
11
0
11 Dec 2019
Why are Adaptive Methods Good for Attention Models?
Why are Adaptive Methods Good for Attention Models?
J.N. Zhang
Sai Praneeth Karimireddy
Andreas Veit
Seungyeon Kim
Sashank J. Reddi
Surinder Kumar
S. Sra
18
79
0
06 Dec 2019
Fantastic Generalization Measures and Where to Find Them
Fantastic Generalization Measures and Where to Find Them
Yiding Jiang
Behnam Neyshabur
H. Mobahi
Dilip Krishnan
Samy Bengio
AI4CE
14
596
0
04 Dec 2019
Domain-independent Dominance of Adaptive Methods
Domain-independent Dominance of Adaptive Methods
Pedro H. P. Savarese
David A. McAllester
Sudarshan Babu
Michael Maire
ODL
18
22
0
04 Dec 2019
A Comparative Study of Pretrained Language Models on Thai Social Text
  Categorization
A Comparative Study of Pretrained Language Models on Thai Social Text Categorization
Thanapapas Horsuwan
Kasidis Kanwatchara
P. Vateekul
B. Kijsirikul
14
9
0
03 Dec 2019
TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in
  (Un-)Supervised NLP
TX-Ray: Quantifying and Explaining Model-Knowledge Transfer in (Un-)Supervised NLP
Nils Rethmeier
V. Saxena
Isabelle Augenstein
FAtt
25
2
0
02 Dec 2019
How Can We Know What Language Models Know?
How Can We Know What Language Models Know?
Zhengbao Jiang
Frank F. Xu
Jun Araki
Graham Neubig
KELM
41
1,373
0
28 Nov 2019
SimpleBooks: Long-term dependency book dataset with simplified English
  vocabulary for word-level language modeling
SimpleBooks: Long-term dependency book dataset with simplified English vocabulary for word-level language modeling
Huyen Nguyen
9
2
0
27 Nov 2019
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence
  Modeling
DeFINE: DEep Factorized INput Token Embeddings for Neural Sequence Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
AI4TS
38
23
0
27 Nov 2019
Autoencoding Undirected Molecular Graphs With Neural Networks
Autoencoding Undirected Molecular Graphs With Neural Networks
Jeppe Johan Waarkjaer Olsen
Peter Ebert Christensen
Martin Hangaard Hansen
alexander rosenberg johansen
AI4CE
19
0
0
26 Nov 2019
Relevance-Promoting Language Model for Short-Text Conversation
Relevance-Promoting Language Model for Short-Text Conversation
Xin Li
Piji Li
Wei Bi
Xiaojiang Liu
Wai Lam
16
11
0
26 Nov 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
27
68
0
26 Nov 2019
AutoShrink: A Topology-aware NAS for Discovering Efficient Neural
  Architecture
AutoShrink: A Topology-aware NAS for Discovering Efficient Neural Architecture
Tunhou Zhang
Hsin-Pai Cheng
Zhenwen Li
Feng Yan
Chengyu Huang
H. Li
Yiran Chen
10
9
0
21 Nov 2019
Thick-Net: Parallel Network Structure for Sequential Modeling
Thick-Net: Parallel Network Structure for Sequential Modeling
Yu-Xuan Li
Jin-Yuan Liu
Liang Li
Xiang Guan
19
0
0
19 Nov 2019
RotationOut as a Regularization Method for Neural Network
RotationOut as a Regularization Method for Neural Network
Kaiqin Hu
Barnabás Póczós
33
1
0
18 Nov 2019
Multi-Zone Unit for Recurrent Neural Networks
Multi-Zone Unit for Recurrent Neural Networks
Fandong Meng
Jinchao Zhang
Yang Liu
Jie Zhou
AI4CE
19
2
0
17 Nov 2019
A Subword Level Language Model for Bangla Language
A Subword Level Language Model for Bangla Language
Aisha Khatun
Anisur Rahman
Hemayet Ahmed Chowdhury
Md. Saiful Islam
A. Tasnim
17
4
0
15 Nov 2019
Exploiting Token and Path-based Representations of Code for Identifying
  Security-Relevant Commits
Exploiting Token and Path-based Representations of Code for Identifying Security-Relevant Commits
Achyudh Ram
Ji Xin
M. Nagappan
Yaoliang Yu
Rocío Cabrera Lozoya
A. Sabetta
Jimmy J. Lin
27
3
0
15 Nov 2019
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language
  Representation
KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation
Xiaozhi Wang
Tianyu Gao
Zhaocheng Zhu
Zhengyan Zhang
Zhiyuan Liu
Juan-Zi Li
Jian Tang
15
647
0
13 Nov 2019
Optimizing Millions of Hyperparameters by Implicit Differentiation
Optimizing Millions of Hyperparameters by Implicit Differentiation
Jonathan Lorraine
Paul Vicol
David Duvenaud
DD
30
403
0
06 Nov 2019
On the Effectiveness of the Pooling Methods for Biomedical Relation
  Extraction with Deep Learning
On the Effectiveness of the Pooling Methods for Biomedical Relation Extraction with Deep Learning
Tuan Ngo Nguyen
Franck Dernoncourt
Thien Huu Nguyen
19
5
0
04 Nov 2019
An Adaptive and Momental Bound Method for Stochastic Learning
An Adaptive and Momental Bound Method for Stochastic Learning
Jianbang Ding
Xuancheng Ren
Ruixuan Luo
Xu Sun
ODL
19
46
0
27 Oct 2019
FineText: Text Classification via Attention-based Language Model
  Fine-tuning
FineText: Text Classification via Attention-based Language Model Fine-tuning
Yunzhe Tao
Saurabh Gupta
Satyapriya Krishna
Xiong Zhou
Orchid Majumder
Vineet Khare
21
3
0
25 Oct 2019
Efficient Decoupled Neural Architecture Search by Structure and
  Operation Sampling
Efficient Decoupled Neural Architecture Search by Structure and Operation Sampling
Heung-Chang Lee
Do-Guk Kim
Bohyung Han
38
6
0
23 Oct 2019
Previous
123...567...91011
Next