ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown
Title
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
43
36
0
16 Apr 2021
Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday
  Information Diets
Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday Information Diets
Roland Aydin
Lars Klein
Arnaud Miribel
Robert West
16
1
0
16 Apr 2021
RIANN -- A Robust Neural Network Outperforms Attitude Estimation Filters
RIANN -- A Robust Neural Network Outperforms Attitude Estimation Filters
Daniel Weber
C. Gühmann
Thomas Seel
20
35
0
15 Apr 2021
Lessons on Parameter Sharing across Layers in Transformers
Lessons on Parameter Sharing across Layers in Transformers
Sho Takase
Shun Kiyono
25
84
0
13 Apr 2021
Evaluating Saliency Methods for Neural Language Models
Evaluating Saliency Methods for Neural Language Models
Shuoyang Ding
Philipp Koehn
FAtt
XAI
23
54
0
12 Apr 2021
Revisiting Simple Neural Probabilistic Language Models
Revisiting Simple Neural Probabilistic Language Models
Simeng Sun
Mohit Iyyer
24
14
0
08 Apr 2021
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
Sho Takase
Shun Kiyono
33
45
0
05 Apr 2021
Low-Resource Language Modelling of South African Languages
Low-Resource Language Modelling of South African Languages
Stuart Mesham
Luc Hayward
Jared Shapiro
Jan Buys
4
14
0
01 Apr 2021
Data Augmentation in a Hybrid Approach for Aspect-Based Sentiment
  Analysis
Data Augmentation in a Hybrid Approach for Aspect-Based Sentiment Analysis
Tomas Liesting
Flavius Frasincar
Maria Mihaela Truşcǎ
18
30
0
29 Mar 2021
Data Augmentation in Natural Language Processing: A Novel Text
  Generation Approach for Long and Short Text Classifiers
Data Augmentation in Natural Language Processing: A Novel Text Generation Approach for Long and Short Text Classifiers
Markus Bayer
M. Kaufhold
Björn Buchhold
Marcel Keller
J. Dallmeyer
Christian A. Reuter
31
113
0
26 Mar 2021
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep
  Neural Network and Transfer Learning
ESCORT: Ethereum Smart COntRacTs Vulnerability Detection using Deep Neural Network and Transfer Learning
O. Lutz
Huili Chen
Hossein Fereidooni
Christoph Sendner
Alexandra Dmitrienko
A. Sadeghi
F. Koushanfar
15
46
0
23 Mar 2021
Token-wise Curriculum Learning for Neural Machine Translation
Token-wise Curriculum Learning for Neural Machine Translation
Chen Liang
Haoming Jiang
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
T. Zhao
21
4
0
20 Mar 2021
Improving Authorship Verification using Linguistic Divergence
Improving Authorship Verification using Linguistic Divergence
Yifan Zhang
Dainis Boumber
Marjan Hosseinia
Fan Yang
Arjun Mukherjee
12
1
0
12 Mar 2021
Nondeterminism and Instability in Neural Network Optimization
Nondeterminism and Instability in Neural Network Optimization
Cecilia Summers
M. Dinneen
27
38
0
08 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
348
0
03 Mar 2021
indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language
  Identification in Dravidian Languages
indicnlp@kgp at DravidianLangTech-EACL2021: Offensive Language Identification in Dravidian Languages
Kushal Kedia
Abhilash Nandy
24
23
0
14 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Train your classifier first: Cascade Neural Networks Training from upper
  layers to lower layers
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers
Shucong Zhang
Cong-Thanh Do
R. Doddipatla
Erfan Loweimi
P. Bell
Steve Renals
24
2
0
09 Feb 2021
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise
Eliminating Sharp Minima from SGD with Truncated Heavy-tailed Noise
Xingyu Wang
Sewoong Oh
C. Rhee
13
13
0
08 Feb 2021
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
A Comprehensive Survey on Hardware-Aware Neural Architecture Search
Hadjer Benmeziane
Kaoutar El Maghraoui
Hamza Ouarnoughi
Smail Niar
Martin Wistuba
Naigang Wang
34
96
0
22 Jan 2021
Detecting Hostile Posts using Relational Graph Convolutional Network
Detecting Hostile Posts using Relational Graph Convolutional Network
Sarthak
Shikhar Shukla
K. V. Arya
GNN
11
2
0
10 Jan 2021
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
AutoDropout: Learning Dropout Patterns to Regularize Deep Networks
Hieu H. Pham
Quoc V. Le
76
56
0
05 Jan 2021
Leveraging Audio Gestalt to Predict Media Memorability
Leveraging Audio Gestalt to Predict Media Memorability
Lorin Sweeney
Graham Healy
Alan F. Smeaton
29
6
0
31 Dec 2020
Contextual Temperature for Language Modeling
Contextual Temperature for Language Modeling
Pei-Hsin Wang
Sheng-Iou Hsieh
Shih-Chieh Chang
Yu-Ting Chen
Jia-Yu Pan
Wei Wei
Da-Chang Juan
45
25
0
25 Dec 2020
Optimizing Deep Neural Networks through Neuroevolution with Stochastic
  Gradient Descent
Optimizing Deep Neural Networks through Neuroevolution with Stochastic Gradient Descent
Haichao Zhang
K. Hao
Lei Gao
Bing Wei
Xue-song Tang
19
12
0
21 Dec 2020
Recent advances in deep learning theory
Recent advances in deep learning theory
Fengxiang He
Dacheng Tao
AI4CE
24
50
0
20 Dec 2020
Data-Efficient Methods for Dialogue Systems
Data-Efficient Methods for Dialogue Systems
Igor Shalyminov
14
0
0
05 Dec 2020
End to End ASR System with Automatic Punctuation Insertion
End to End ASR System with Automatic Punctuation Insertion
Yushi Guan
3DV
27
5
0
03 Dec 2020
Mutual Information Constraints for Monte-Carlo Objectives
Mutual Information Constraints for Monte-Carlo Objectives
Gábor Melis
András Gyorgy
Phil Blunsom
21
1
0
01 Dec 2020
Regularizing Recurrent Neural Networks via Sequence Mixup
Regularizing Recurrent Neural Networks via Sequence Mixup
Armin Karamzade
Amir Najafi
S. Motahari
16
0
0
27 Nov 2020
Learning Associative Inference Using Fast Weight Memory
Learning Associative Inference Using Fast Weight Memory
Imanol Schlag
Tsendsuren Munkhdalai
Jürgen Schmidhuber
KELM
30
44
0
16 Nov 2020
DORB: Dynamically Optimizing Multiple Rewards with Bandits
DORB: Dynamically Optimizing Multiple Rewards with Bandits
Ramakanth Pasunuru
Han Guo
Joey Tianyi Zhou
OffRL
32
6
0
15 Nov 2020
Exploring the Value of Personalized Word Embeddings
Exploring the Value of Personalized Word Embeddings
Charles F Welch
Jonathan K. Kummerfeld
Verónica Pérez-Rosas
Rada Mihalcea
17
15
0
11 Nov 2020
Scaling Hidden Markov Language Models
Scaling Hidden Markov Language Models
Justin T. Chiu
Alexander M. Rush
BDL
22
25
0
09 Nov 2020
Fusion Models for Improved Visual Captioning
Fusion Models for Improved Visual Captioning
M. Kalimuthu
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
26
0
0
28 Oct 2020
Delta-STN: Efficient Bilevel Optimization for Neural Networks using
  Structured Response Jacobians
Delta-STN: Efficient Bilevel Optimization for Neural Networks using Structured Response Jacobians
Juhan Bae
Roger C. Grosse
27
24
0
26 Oct 2020
Revisiting Neural Language Modelling with Syllables
Revisiting Neural Language Modelling with Syllables
Arturo Oncevay
Kervy Rivas Rojas
18
2
0
24 Oct 2020
Large Scale Legal Text Classification Using Transformer Models
Large Scale Legal Text Classification Using Transformer Models
Zein Shaheen
G. Wohlgenannt
Erwin Filtz
AILaw
32
67
0
24 Oct 2020
On Convergence and Generalization of Dropout Training
On Convergence and Generalization of Dropout Training
Poorya Mianjy
R. Arora
37
30
0
23 Oct 2020
Exploiting News Article Structure for Automatic Corpus Generation of
  Entailment Datasets
Exploiting News Article Structure for Automatic Corpus Generation of Entailment Datasets
Jan Christian Blaise Cruz
Jose Kristian Resabal
James Lin
Dan John Velasco
C. Cheng
6
11
0
22 Oct 2020
Cascaded Models With Cyclic Feedback For Direct Speech Translation
Cascaded Models With Cyclic Feedback For Direct Speech Translation
Tsz Kin Lam
Shigehiko Schamoni
Stefan Riezler
32
12
0
21 Oct 2020
Adaptive Gradient Method with Resilience and Momentum
Adaptive Gradient Method with Resilience and Momentum
Jie Liu
Chen Lin
Chuming Li
Lu Sheng
Ming Sun
Junjie Yan
Wanli Ouyang
ODL
14
0
0
21 Oct 2020
Complaint Identification in Social Media with Transformer Networks
Complaint Identification in Social Media with Transformer Networks
Mali Jin
Nikolaos Aletras
12
16
0
21 Oct 2020
Where's the Question? A Multi-channel Deep Convolutional Neural Network
  for Question Identification in Textual Data
Where's the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data
George Michalopoulos
Helen H. Chen
Alexander Wong
MedIm
17
1
0
15 Oct 2020
Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource
  Language
Pagsusuri ng RNN-based Transfer Learning Technique sa Low-Resource Language
Dan John Velasco
9
3
0
13 Oct 2020
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM
  in Deep Learning
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
25
228
0
12 Oct 2020
Compositional Demographic Word Embeddings
Compositional Demographic Word Embeddings
Charles F Welch
Jonathan K. Kummerfeld
Verónica Pérez-Rosas
Rada Mihalcea
21
31
0
06 Oct 2020
On the Branching Bias of Syntax Extracted from Pre-trained Language
  Models
On the Branching Bias of Syntax Extracted from Pre-trained Language Models
Huayang Li
Lemao Liu
Guoping Huang
Shuming Shi
23
6
0
06 Oct 2020
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on
  Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gauravarora@HASOC-Dravidian-CodeMix-FIRE2020: Pre-training ULMFiT on Synthetically Generated Code-Mixed Data for Hate Speech Detection
Gaurav Arora
14
27
0
05 Oct 2020
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Improved Analysis of Clipping Algorithms for Non-convex Optimization
Bohang Zhang
Jikai Jin
Cong Fang
Liwei Wang
38
87
0
05 Oct 2020
Previous
12345...91011
Next