ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02182
  4. Cited By
Regularizing and Optimizing LSTM Language Models

Regularizing and Optimizing LSTM Language Models

7 August 2017
Stephen Merity
N. Keskar
R. Socher
ArXivPDFHTML

Papers citing "Regularizing and Optimizing LSTM Language Models"

50 / 509 papers shown
Title
Modeling cognitive processes of natural reading with transformer-based Language Models
Modeling cognitive processes of natural reading with transformer-based Language Models
Bruno Bianchi
Fermín Travi
Juan E. Kamienkowski
17
0
0
16 May 2025
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Graph Laplacian Wavelet Transformer via Learnable Spectral Decomposition
Andrew Kiruluta
Eric Lundy
Priscilla Burity
29
0
0
09 May 2025
Smoothed Normalization for Efficient Distributed Private Optimization
Smoothed Normalization for Efficient Distributed Private Optimization
Egor Shulgin
Sarit Khirirat
Peter Richtárik
FedML
87
0
0
20 Feb 2025
When, Where and Why to Average Weights?
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
96
0
0
10 Feb 2025
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures
Gabriel Lindenmaier
Sean Papay
Sebastian Padó
65
0
0
02 Feb 2025
Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Tomasz Rutowski
Amir Harati
Yang Lu
Elizabeth Shriberg
36
15
0
03 Jan 2025
Mask Factory: Towards High-quality Synthetic Data Generation for
  Dichotomous Image Segmentation
Mask Factory: Towards High-quality Synthetic Data Generation for Dichotomous Image Segmentation
Haotian Qian
YD Chen
Shengtao Lou
Fahad Shahbaz Khan
Xiaogang Jin
Deng-Ping Fan
DiffM
47
6
0
26 Dec 2024
Robust Speech and Natural Language Processing Models for Depression
  Screening
Robust Speech and Natural Language Processing Models for Depression Screening
Y. Lu
A. Harati
T. Rutowski
R. Oliveira
P. Chlebek
E. Shriberg
AI4MH
41
5
0
26 Dec 2024
Classification of residential and non-residential buildings based on
  satellite data using deep learning
Classification of residential and non-residential buildings based on satellite data using deep learning
Jai G Singla
20
0
0
11 Nov 2024
Don't Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune
  Attention in Extreme Multi-Label Text Classification
Don't Just Pay Attention, PLANT It: Transfer L2R Models to Fine-tune Attention in Extreme Multi-Label Text Classification
Debjyoti Saharoy
J. Aslam
Virgil Pavlu
VLM
39
0
0
30 Oct 2024
From Gradient Clipping to Normalization for Heavy Tailed SGD
From Gradient Clipping to Normalization for Heavy Tailed SGD
Florian Hübler
Ilyas Fatkhullin
Niao He
40
5
0
17 Oct 2024
Financial Sentiment Analysis on News and Reports Using Large Language
  Models and FinBERT
Financial Sentiment Analysis on News and Reports Using Large Language Models and FinBERT
Yanxin Shen
Pulin Kirin Zhang
AIFin
34
11
0
02 Oct 2024
Modelando procesos cognitivos de la lectura natural con GPT-2
Modelando procesos cognitivos de la lectura natural con GPT-2
Bruno Bianchi
Alfredo Umfurer
Juan E. Kamienkowski
33
0
0
30 Sep 2024
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For
  Asthma Patient Support
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Support
Adil Bahaj
Mounir Ghogho
46
2
0
24 Sep 2024
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Explaining Datasets in Words: Statistical Models with Natural Language Parameters
Ruiqi Zhong
Heng Wang
Dan Klein
Jacob Steinhardt
37
6
0
13 Sep 2024
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language
  Models for Privacy Leakage
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage
Md. Rafi Ur Rashid
Jing Liu
T. Koike-Akino
Shagufta Mehnaz
Ye Wang
MU
SILM
46
3
0
30 Aug 2024
Interactive Topic Models with Optimal Transport
Interactive Topic Models with Optimal Transport
Garima Dhanania
Sheshera Mysore
Chau Minh Pham
Mohit Iyyer
Hamed Zamani
Andrew McCallum
OT
35
1
0
28 Jun 2024
Hidden Holes: topological aspects of language models
Hidden Holes: topological aspects of language models
Stephen Fitz
P. Romero
Jiyan Jonas Schneider
43
0
0
09 Jun 2024
Thinking Tokens for Language Modeling
Thinking Tokens for Language Modeling
David Herel
Tomáš Mikolov
LRM
27
2
0
14 May 2024
Addressing Topic Granularity and Hallucination in Large Language Models
  for Topic Modelling
Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling
Yida Mu
Peizhen Bai
Kalina Bontcheva
Xingyi Song
33
6
0
01 May 2024
Weight Sparsity Complements Activity Sparsity in Neuromorphic Language
  Models
Weight Sparsity Complements Activity Sparsity in Neuromorphic Language Models
Rishav Mukherji
Mark Schöne
Khaleelulla Khan Nazeer
Christian Mayr
David Kappel
Anand Subramoney
37
2
0
01 May 2024
Concept Induction: Analyzing Unstructured Text with High-Level Concepts
  Using LLooM
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
Michelle S. Lam
Janice Teoh
James A. Landay
Jeffrey Heer
Michael S. Bernstein
35
43
0
18 Apr 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
39
1
0
12 Apr 2024
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule
  Joint Evolution
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution
Brandon Morgan
Dean Frederick Hougen
ODL
43
0
0
10 Apr 2024
Privacy Backdoors: Enhancing Membership Inference through Poisoning
  Pre-trained Models
Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
Yuxin Wen
Leo Marchyok
Sanghyun Hong
Jonas Geiping
Tom Goldstein
Nicholas Carlini
SILM
AAML
39
9
0
01 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
42
4
0
01 Apr 2024
A Stochastic Quasi-Newton Method for Non-convex Optimization with
  Non-uniform Smoothness
A Stochastic Quasi-Newton Method for Non-convex Optimization with Non-uniform Smoothness
Zhenyu Sun
Ermin Wei
44
0
0
22 Mar 2024
Multi-Objective Evolutionary Neural Architecture Search for Recurrent
  Neural Networks
Multi-Objective Evolutionary Neural Architecture Search for Recurrent Neural Networks
Reinhard Booysen
Anna Sergeevna Bosman
40
1
0
17 Mar 2024
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning
  using ULMFiT
Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT
Aisha Khatun
Anisur Rahman
Md. Saiful Islam
Hemayet Ahmed Chowdhury
A. Tasnim
31
2
0
08 Mar 2024
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma
G. Qiao
Yián Liu
L. Meng
N. Ning
Yang Liu
Shaogang Hu
AAML
MQ
42
3
0
06 Mar 2024
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with
  Wider Topic Analysis
Arabic Text Sentiment Analysis: Reinforcing Human-Performed Surveys with Wider Topic Analysis
Latifah Almurqren
Ryan Hodgson
A Ioana Cristea
41
3
0
04 Mar 2024
Learning from Teaching Regularization: Generalizable Correlations Should
  be Easy to Imitate
Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate
Can Jin
Tong Che
Hongwu Peng
Yiyuan Li
Dimitris N. Metaxas
Marco Pavone
44
43
0
05 Feb 2024
Automatic channel selection and spatial feature integration for
  multi-channel speech recognition across various array topologies
Automatic channel selection and spatial feature integration for multi-channel speech recognition across various array topologies
Bingshen Mu
Pengcheng Guo
Dake Guo
Pan Zhou
Wei Chen
Lei Xie
38
2
0
15 Dec 2023
Language Modeling on a SpiNNaker 2 Neuromorphic Chip
Language Modeling on a SpiNNaker 2 Neuromorphic Chip
Khaleelulla Khan Nazeer
Mark Schöne
Rishav Mukherji
Bernhard Vogginger
Christian Mayr
David Kappel
Anand Subramoney
37
5
0
14 Dec 2023
A Unified Sampling Framework for Solver Searching of Diffusion
  Probabilistic Models
A Unified Sampling Framework for Solver Searching of Diffusion Probabilistic Models
En-hao Liu
Xuefei Ning
Huazhong Yang
Yu Wang
DiffM
39
11
0
12 Dec 2023
Advancing State of the Art in Language Modeling
Advancing State of the Art in Language Modeling
David Herel
Tomáš Mikolov
34
1
0
28 Nov 2023
BEND: Benchmarking DNA Language Models on biologically meaningful tasks
BEND: Benchmarking DNA Language Models on biologically meaningful tasks
Frederikke Isa Marin
Felix Teufel
Marc Horlacher
Dennis Madsen
Dennis Pultz
Ole Winther
Wouter Boomsma
22
34
0
21 Nov 2023
Activity Sparsity Complements Weight Sparsity for Efficient RNN
  Inference
Activity Sparsity Complements Weight Sparsity for Efficient RNN Inference
Rishav Mukherji
Mark Schöne
Khaleelulla Khan Nazeer
Christian Mayr
Anand Subramoney
38
2
0
13 Nov 2023
Parameter-Agnostic Optimization under Relaxed Smoothness
Parameter-Agnostic Optimization under Relaxed Smoothness
Florian Hübler
Junchi Yang
Xiang Li
Niao He
34
12
0
06 Nov 2023
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural
  Networks
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks
Xinting Huang
Jiajing Wan
Ioannis Kritikos
Nora Hollenstein
9
3
0
31 Oct 2023
Out-of-distribution Object Detection through Bayesian Uncertainty
  Estimation
Out-of-distribution Object Detection through Bayesian Uncertainty Estimation
Tianhao Zhang
Shenglin Wang
N. Bouaynaya
R. Calinescu
Lyudmila Mihaylova
OODD
21
2
0
29 Oct 2023
Rethinking SIGN Training: Provable Nonconvex Acceleration without First-
  and Second-Order Gradient Lipschitz
Rethinking SIGN Training: Provable Nonconvex Acceleration without First- and Second-Order Gradient Lipschitz
Tao Sun
Congliang Chen
Peng Qiao
Li Shen
Xinwang Liu
Dongsheng Li
36
3
0
23 Oct 2023
Controlled Randomness Improves the Performance of Transformer Models
Controlled Randomness Improves the Performance of Transformer Models
Tobias Deuβer
Cong Zhao
Wolfgang Krämer
David Leonhard
Christian Bauckhage
R. Sifa
24
1
0
20 Oct 2023
Prototype of a robotic system to assist the learning process of English
  language with text-generation through DNN
Prototype of a robotic system to assist the learning process of English language with text-generation through DNN
Carlos Morales-Torres
Mario Campos Soberanis
Diego Campos-Sobrino
13
0
0
20 Sep 2023
Machine Learning Technique Based Fake News Detection
Machine Learning Technique Based Fake News Detection
Biplob Kumar Sutradhar
Mohammad Zonaid
Nushrat Jahan Ria
S. R. H. Noori
35
2
0
18 Sep 2023
Differentiable Retrieval Augmentation via Generative Language Modeling
  for E-commerce Query Intent Classification
Differentiable Retrieval Augmentation via Generative Language Modeling for E-commerce Query Intent Classification
Chenyu Zhao
Yunjiang Jiang
Yiming Qiu
Han Zhang
Wen-Yun Yang
RALM
34
5
0
18 Aug 2023
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Accurate Neural Network Pruning Requires Rethinking Sparse Optimization
Denis Kuznedelev
Eldar Kurtic
Eugenia Iofinova
Elias Frantar
Alexandra Peste
Dan Alistarh
VLM
35
11
0
03 Aug 2023
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated
  Learning with Bayesian Inference-Based Adaptive Dropout
FedBIAD: Communication-Efficient and Accuracy-Guaranteed Federated Learning with Bayesian Inference-Based Adaptive Dropout
Jingjing Xue
Min Liu
Sheng Sun
Yuwei Wang
Hui Jiang
Xue Jiang
21
7
0
14 Jul 2023
Lookaround Optimizer: $k$ steps around, 1 step average
Lookaround Optimizer: kkk steps around, 1 step average
Jiangtao Zhang
Shunyu Liu
Mingli Song
Tongtian Zhu
Zhenxing Xu
Mingli Song
MoMe
37
6
0
13 Jun 2023
Revisiting Conversation Discourse for Dialogue Disentanglement
Revisiting Conversation Discourse for Dialogue Disentanglement
Bobo Li
Hao Fei
Fei Li
Shengqiong Wu
Lizi Liao
Yin-wei Wei
Tat-Seng Chua
Donghong Ji
43
1
0
06 Jun 2023
1234...91011
Next