ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown
Title
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using
  Structured Best-Response Functions
Self-Tuning Networks: Bilevel Optimization of Hyperparameters using Structured Best-Response Functions
M. Mackay
Paul Vicol
Jonathan Lorraine
David Duvenaud
Roger C. Grosse
27
164
0
07 Mar 2019
Alternating Synthetic and Real Gradients for Neural Language Modeling
Fangxin Shang
Hao Zhang
16
1
0
27 Feb 2019
Evaluating the Search Phase of Neural Architecture Search
Evaluating the Search Phase of Neural Architecture Search
Kaicheng Yu
C. Sciuto
Martin Jaggi
C. Musat
Mathieu Salzmann
20
342
0
21 Feb 2019
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise
  Non-linearities
Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
O. Ganea
Sylvain Gelly
Gary Bécigneul
Aliaksei Severyn
26
18
0
21 Feb 2019
Random Search and Reproducibility for Neural Architecture Search
Random Search and Reproducibility for Neural Architecture Search
Liam Li
Ameet Talwalkar
OOD
33
717
0
20 Feb 2019
A Simple Baseline for Bayesian Uncertainty in Deep Learning
A Simple Baseline for Bayesian Uncertainty in Deep Learning
Wesley J. Maddox
T. Garipov
Pavel Izmailov
Dmitry Vetrov
A. Wilson
BDL
UQCV
33
795
0
07 Feb 2019
Compression of Recurrent Neural Networks for Efficient Language Modeling
Compression of Recurrent Neural Networks for Efficient Language Modeling
Artem M. Grachev
D. Ignatov
Andrey V. Savchenko
13
39
0
06 Feb 2019
Augment your batch: better training with larger batches
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
Variational Smoothing in Recurrent Neural Network Language Models
Variational Smoothing in Recurrent Neural Network Language Models
Lingpeng Kong
Gábor Melis
Wang Ling
Lei Yu
Dani Yogatama
21
3
0
27 Jan 2019
State-Regularized Recurrent Neural Networks
State-Regularized Recurrent Neural Networks
Cheng Wang
Mathias Niepert
18
39
0
25 Jan 2019
Towards Non-saturating Recurrent Units for Modelling Long-term
  Dependencies
Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies
A. Chandar
Chinnadhurai Sankar
Eugene Vorontsov
Samira Ebrahimi Kahou
Yoshua Bengio
26
56
0
22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
38
3,674
0
09 Jan 2019
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated
  Recurrent Neural Network
FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network
Aditya Kusupati
Manish Singh
Kush S. Bhatia
A. Kumar
Prateek Jain
Manik Varma
24
189
0
08 Jan 2019
Learning a Generator Model from Terminal Bus Data
Learning a Generator Model from Terminal Bus Data
N. Stulov
D. Sobajic
Yury Maximov
Deepjyoti Deka
Michael Chertkov
22
4
0
03 Jan 2019
A Tutorial on Deep Latent Variable Models of Natural Language
A Tutorial on Deep Latent Variable Models of Natural Language
Yoon Kim
Sam Wiseman
Alexander M. Rush
BDL
VLM
30
42
0
17 Dec 2018
Deep Anomaly Detection with Outlier Exposure
Deep Anomaly Detection with Outlier Exposure
Dan Hendrycks
Mantas Mazeika
Thomas G. Dietterich
OODD
31
1,452
0
11 Dec 2018
Inflo: News Categorization and Keyphrase Extraction for Implementation
  in an Aggregation System
Inflo: News Categorization and Keyphrase Extraction for Implementation in an Aggregation System
Pranav A
Nick Sukiennik
Pan Hui
32
2
0
10 Dec 2018
ESPNetv2: A Light-weight, Power Efficient, and General Purpose
  Convolutional Neural Network
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network
Sachin Mehta
Mohammad Rastegari
Linda G. Shapiro
Hannaneh Hajishirzi
VLM
29
392
0
28 Nov 2018
Plan-And-Write: Towards Better Automatic Storytelling
Plan-And-Write: Towards Better Automatic Storytelling
Lili Yao
Nanyun Peng
R. Weischedel
Kevin Knight
Dongyan Zhao
Rui Yan
16
403
0
14 Nov 2018
Modeling Local Dependence in Natural Language with Multi-channel
  Recurrent Neural Networks
Modeling Local Dependence in Natural Language with Multi-channel Recurrent Neural Networks
Chang Xu
Weiran Huang
Hongwei Wang
G. Wang
Tie-Yan Liu
16
13
0
13 Nov 2018
Fine-tuning of Language Models with Discriminator
Fine-tuning of Language Models with Discriminator
Vadim Popov
Mikhail Kudinov
16
2
0
12 Nov 2018
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video
  Captioning
Learning to Compose Topic-Aware Mixture of Experts for Zero-Shot Video Captioning
Yoonchang Sung
Jiawei Wu
Da Zhang
Yu-Chuan Su
Pratap Tokekar
26
39
0
07 Nov 2018
Analysing Dropout and Compounding Errors in Neural Language Models
Analysing Dropout and Compounding Errors in Neural Language Models
James OÑeill
Danushka Bollegala
22
1
0
02 Nov 2018
Progress and Tradeoffs in Neural Language Models
Progress and Tradeoffs in Neural Language Models
Raphael Tang
Jimmy J. Lin
16
5
0
02 Nov 2018
You May Not Need Attention
You May Not Need Attention
Ofir Press
Noah A. Smith
14
27
0
31 Oct 2018
Language Modeling with Sparse Product of Sememe Experts
Language Modeling with Sparse Product of Sememe Experts
Yihong Gu
Jun Yan
Hao Zhu
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Fen Lin
Leyu Lin
MoE
15
31
0
29 Oct 2018
Language Modeling for Code-Switching: Evaluation, Integration of
  Monolingual Data, and Discriminative Training
Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training
Hila Gonen
Yoav Goldberg
14
31
0
28 Oct 2018
Reversible Recurrent Neural Networks
Reversible Recurrent Neural Networks
M. Mackay
Paul Vicol
Jimmy Ba
Roger C. Grosse
6
52
0
25 Oct 2018
Universal Language Model Fine-Tuning with Subword Tokenization for
  Polish
Universal Language Model Fine-Tuning with Subword Tokenization for Polish
Piotr Czapla
Jeremy Howard
Marcin Kardas
8
7
0
24 Oct 2018
Language Modeling at Scale
Language Modeling at Scale
Md. Mostofa Ali Patwary
Milind Chabbi
Heewoo Jun
Jiaji Huang
G. Diamos
Kenneth Church
ALM
28
5
0
23 Oct 2018
Ordered Neurons: Integrating Tree Structures into Recurrent Neural
  Networks
Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks
Songlin Yang
Shawn Tan
Alessandro Sordoni
Aaron Courville
32
323
0
22 Oct 2018
PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide
  Sequences
PepCVAE: Semi-Supervised Targeted Design of Antimicrobial Peptide Sequences
Payel Das
Kahini Wadhawan
Oscar Chang
Tom Sercu
Cicero Nogueira dos Santos
Matthew D Riemer
Vijil Chenthamarakshan
Inkit Padhi
Aleksandra Mojsilović
DRL
34
0
0
17 Oct 2018
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural
  Networks
Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Xiaodong Cui
Wei Zhang
Zoltán Tüske
M. Picheny
ODL
16
89
0
16 Oct 2018
Trellis Networks for Sequence Modeling
Trellis Networks for Sequence Modeling
Shaojie Bai
J. Zico Kolter
V. Koltun
25
145
0
15 Oct 2018
A System for Massively Parallel Hyperparameter Tuning
A System for Massively Parallel Hyperparameter Tuning
Liam Li
Kevin G. Jamieson
Afshin Rostamizadeh
Ekaterina Gonina
Moritz Hardt
Benjamin Recht
Ameet Talwalkar
24
372
0
13 Oct 2018
Dropout as a Structured Shrinkage Prior
Dropout as a Structured Shrinkage Prior
Eric T. Nalisnick
José Miguel Hernández-Lobato
Padhraic Smyth
BDL
UQCV
6
1
0
09 Oct 2018
Understanding Recurrent Neural Architectures by Analyzing and
  Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets
Understanding Recurrent Neural Architectures by Analyzing and Synthesizing Long Distance Dependencies in Benchmark Sequential Datasets
Abhijit Mahalunkar
John D. Kelleher
24
8
0
06 Oct 2018
Adaptive Pruning of Neural Language Models for Mobile Devices
Adaptive Pruning of Neural Language Models for Mobile Devices
Raphael Tang
Jimmy J. Lin
16
6
0
27 Sep 2018
Information-Weighted Neural Cache Language Models for ASR
Information-Weighted Neural Cache Language Models for ASR
Lyan Verwimp
J. Pelemans
Hugo Van hamme
P. Wambacq
KELM
RALM
11
2
0
24 Sep 2018
Multi-task Learning with Sample Re-weighting for Machine Reading
  Comprehension
Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension
Yichong Xu
Xiaodong Liu
Yelong Shen
Jingjing Liu
Jianfeng Gao
23
51
0
18 Sep 2018
FRAGE: Frequency-Agnostic Word Representation
FRAGE: Frequency-Agnostic Word Representation
Chengyue Gong
Di He
Xu Tan
Tao Qin
Liwei Wang
Tie-Yan Liu
OOD
28
144
0
18 Sep 2018
Towards JointUD: Part-of-speech Tagging and Lemmatization using
  Recurrent Neural Networks
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
G. Arakelyan
Karen Hambardzumyan
Hrant Khachatrian
26
9
0
10 Sep 2018
MTNT: A Testbed for Machine Translation of Noisy Text
MTNT: A Testbed for Machine Translation of Noisy Text
Paul Michel
Graham Neubig
19
145
0
02 Sep 2018
Direct Output Connection for a High-Rank Language Model
Direct Output Connection for a High-Rank Language Model
Sho Takase
Jun Suzuki
Masaaki Nagata
18
36
0
30 Aug 2018
Pyramidal Recurrent Unit for Language Modeling
Pyramidal Recurrent Unit for Language Modeling
Sachin Mehta
Rik Koncel-Kedziorski
Mohammad Rastegari
Hannaneh Hajishirzi
21
10
0
27 Aug 2018
Dissecting Contextual Word Embeddings: Architecture and Representation
Dissecting Contextual Word Embeddings: Architecture and Representation
Matthew E. Peters
Mark Neumann
Luke Zettlemoyer
Wen-tau Yih
35
425
0
27 Aug 2018
Predefined Sparseness in Recurrent Sequence Models
Predefined Sparseness in Recurrent Sequence Models
T. Demeester
Johannes Deleu
Fréderic Godin
Chris Develder
16
3
0
27 Aug 2018
Financial Aspect-Based Sentiment Analysis using Deep Representations
Financial Aspect-Based Sentiment Analysis using Deep Representations
Steven Yang
Jason Rosenfeld
Jacques Makutonin
21
13
0
23 Aug 2018
Improving Abstraction in Text Summarization
Improving Abstraction in Text Summarization
Wojciech Kry'sciñski
Romain Paulus
Caiming Xiong
R. Socher
18
147
0
23 Aug 2018
Neural Architecture Optimization
Neural Architecture Optimization
Renqian Luo
Fei Tian
Tao Qin
Enhong Chen
Tie-Yan Liu
3DV
26
648
0
22 Aug 2018
Previous
123...101189
Next