ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1905.04226
  4. Cited By
Language Modeling with Deep Transformers

Language Modeling with Deep Transformers

10 May 2019
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
    KELM
ArXivPDFHTML

Papers citing "Language Modeling with Deep Transformers"

43 / 43 papers shown
Title
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
48
0
0
13 Mar 2025
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
What Information Contributes to Log-based Anomaly Detection? Insights from a Configurable Transformer-Based Approach
Xingfang Wu
Heng Li
Foutse Khomh
AI4TS
30
0
0
30 Sep 2024
Lattice Rescoring Based on Large Ensemble of Complementary Neural
  Language Models
Lattice Rescoring Based on Large Ensemble of Complementary Neural Language Models
A. Ogawa
Naohiro Tawara
Marc Delcroix
S. Araki
35
3
0
20 Dec 2023
Simul-LLM: A Framework for Exploring High-Quality Simultaneous
  Translation with Large Language Models
Simul-LLM: A Framework for Exploring High-Quality Simultaneous Translation with Large Language Models
Victor Agostinelli
Max Wild
Matthew Raffel
Kazi Ahmed Asif Fuad
Lizhong Chen
26
6
0
07 Dec 2023
Forgetting Private Textual Sequences in Language Models via
  Leave-One-Out Ensemble
Forgetting Private Textual Sequences in Language Models via Leave-One-Out Ensemble
Zhe Liu
Ozlem Kalinli
MU
KELM
28
2
0
28 Sep 2023
Recovering from Privacy-Preserving Masking with Large Language Models
Recovering from Privacy-Preserving Masking with Large Language Models
A. Vats
Zhe Liu
Peng Su
Debjyoti Paul
Yingyi Ma
Yutong Pang
Zeeshan Ahmed
Ozlem Kalinli
31
9
0
12 Sep 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
30
4
0
24 Jul 2023
Massively Multilingual Shallow Fusion with Large Language Models
Massively Multilingual Shallow Fusion with Large Language Models
Ke Hu
Tara N. Sainath
Bo-wen Li
Nan Du
Yanping Huang
Andrew M. Dai
Yu Zhang
Rodrigo Cabrera
Z. Chen
Trevor Strohman
35
13
0
17 Feb 2023
Memory Augmented Lookup Dictionary based Language Modeling for Automatic
  Speech Recognition
Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition
Yukun Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
RALM
34
0
0
30 Dec 2022
Adaptive Multi-Corpora Language Model Training for Speech Recognition
Adaptive Multi-Corpora Language Model Training for Speech Recognition
Yingyi Ma
Zhe Liu
Xuedong Zhang
31
2
0
09 Nov 2022
Is Encoder-Decoder Redundant for Neural Machine Translation?
Is Encoder-Decoder Redundant for Neural Machine Translation?
Yingbo Gao
Christian Herold
Zijian Yang
Hermann Ney
27
4
0
21 Oct 2022
Mitigating Unintended Memorization in Language Models via Alternating
  Teaching
Mitigating Unintended Memorization in Language Models via Alternating Teaching
Zhe Liu
Xuedong Zhang
Fuchun Peng
30
3
0
13 Oct 2022
Bayesian Neural Network Language Modeling for Speech Recognition
Bayesian Neural Network Language Modeling for Speech Recognition
Boyang Xue
Shoukang Hu
Junhao Xu
Mengzhe Geng
Xunying Liu
Helen M. Meng
UQCV
BDL
44
14
0
28 Aug 2022
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry
  Estimation
AFT-VO: Asynchronous Fusion Transformers for Multi-View Visual Odometry Estimation
Nimet Kaygusuz
Oscar Alejandro Mendez Maldonado
Richard Bowden
29
5
0
26 Jun 2022
Neural Differential Equations for Learning to Program Neural Nets
  Through Continuous Learning Rules
Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules
Kazuki Irie
Francesco Faccio
Jürgen Schmidhuber
AI4TS
33
11
0
03 Jun 2022
Visual Speech Recognition for Multiple Languages in the Wild
Visual Speech Recognition for Multiple Languages in the Wild
Pingchuan Ma
Stavros Petridis
M. Pantic
VLM
125
144
0
26 Feb 2022
The Dual Form of Neural Networks Revisited: Connecting Test Time
  Predictions to Training Patterns via Spotlights of Attention
The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
14
42
0
11 Feb 2022
Mixed Precision of Quantization of Transformer Language Models for
  Speech Recognition
Mixed Precision of Quantization of Transformer Language Models for Speech Recognition
Junhao Xu
Shoukang Hu
Jianwei Yu
Xunying Liu
Helen M. Meng
MQ
40
15
0
29 Nov 2021
Self-Normalized Importance Sampling for Neural Language Modeling
Self-Normalized Importance Sampling for Neural Language Modeling
Zijian Yang
Yingbo Gao
Alexander Gerstenberger
Jintao Jiang
Ralf Schluter
Hermann Ney
19
1
0
11 Nov 2021
The Neural Data Router: Adaptive Control Flow in Transformers Improves
  Systematic Generalization
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
33
55
0
14 Oct 2021
On Language Model Integration for RNN Transducer based Speech
  Recognition
On Language Model Integration for RNN Transducer based Speech Recognition
Wei Zhou
Zuoyun Zheng
Ralf Schluter
Hermann Ney
37
22
0
13 Oct 2021
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular
  Subword Units
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword Units
Yosuke Higuchi
Keita Karube
Tetsuji Ogawa
Tetsunori Kobayashi
18
22
0
08 Oct 2021
Private Language Model Adaptation for Speech Recognition
Private Language Model Adaptation for Speech Recognition
Zhe Liu
Ke Li
Shreyan Bakshi
Fuchun Peng
29
6
0
28 Sep 2021
Cross-utterance Reranking Models with BERT and Graph Convolutional
  Networks for Conversational Speech Recognition
Cross-utterance Reranking Models with BERT and Graph Convolutional Networks for Conversational Speech Recognition
Shih-Hsuan Chiu
Tien-Hong Lo
Fu-An Chao
Berlin Chen
BDL
33
10
0
13 Jun 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
50
1,088
0
08 Jun 2021
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
F. Khan
Ming-Hsuan Yang
ViT
265
621
0
21 May 2021
Relative Positional Encoding for Transformers with Linear Complexity
Relative Positional Encoding for Transformers with Linear Complexity
Antoine Liutkus
Ondřej Cífka
Shih-Lun Wu
Umut Simsekli
Yi-Hsuan Yang
Gaël Richard
33
44
0
18 May 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
28
12
0
12 Apr 2021
A Parallelizable Lattice Rescoring Strategy with Neural Language Models
A Parallelizable Lattice Rescoring Strategy with Neural Language Models
Ke Li
Daniel Povey
Sanjeev Khudanpur
13
16
0
08 Mar 2021
Linear Transformers Are Secretly Fast Weight Programmers
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
34
224
0
22 Feb 2021
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
25
1,106
0
30 Jun 2020
Early Stage LM Integration Using Local and Global Log-Linear Combination
Early Stage LM Integration Using Local and Global Log-Linear Combination
Wilfried Michel
Ralf Schluter
Hermann Ney
11
11
0
20 May 2020
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network
  Language Model
Contextualizing ASR Lattice Rescoring with Hybrid Pointer Network Language Model
Da-Rong Liu
Chunxi Liu
Frank Zhang
Gabriel Synnaeve
Yatharth Saraf
Geoffrey Zweig
23
19
0
15 May 2020
Code Prediction by Feeding Trees to Transformers
Code Prediction by Feeding Trees to Transformers
Seohyun Kim
Jinman Zhao
Yuchi Tian
S. Chandra
33
216
0
30 Mar 2020
Sign Language Transformers: Joint End-to-end Sign Language Recognition
  and Translation
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation
Necati Cihan Camgöz
Oscar Koller
Simon Hadfield
Richard Bowden
SLR
17
489
0
30 Mar 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
15
83
0
19 Dec 2019
A Simplified Fully Quantized Transformer for End-to-end Speech
  Recognition
A Simplified Fully Quantized Transformer for End-to-end Speech Recognition
Alex Bie
Bharat Venkitesh
João Monteiro
Md. Akmal Haidar
Mehdi Rezagholizadeh
MQ
24
27
0
09 Nov 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
15
173
0
23 Oct 2019
A Comparative Study on Transformer vs RNN in Speech Applications
A Comparative Study on Transformer vs RNN in Speech Applications
Shigeki Karita
Nanxin Chen
Tomoki Hayashi
Takaaki Hori
Hirofumi Inaguma
...
Ryuichi Yamamoto
Xiao-fei Wang
Shinji Watanabe
Takenori Yoshimura
Wangyou Zhang
25
716
0
13 Sep 2019
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data
  Augmentation
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Christoph Luscher
Eugen Beck
Kazuki Irie
M. Kitza
Wilfried Michel
Albert Zeyer
Ralf Schluter
Hermann Ney
VLM
13
234
0
08 May 2019
A Decomposable Attention Model for Natural Language Inference
A Decomposable Attention Model for Natural Language Inference
Ankur P. Parikh
Oscar Täckström
Dipanjan Das
Jakob Uszkoreit
213
1,367
0
06 Jun 2016
1