ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.09577
  4. Cited By
NeMo: a toolkit for building AI applications using Neural Modules

NeMo: a toolkit for building AI applications using Neural Modules

14 September 2019
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
Boris Ginsburg
Samuel Kriman
Stanislav Beliaev
Vitaly Lavrukhin
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
ArXivPDFHTML

Papers citing "NeMo: a toolkit for building AI applications using Neural Modules"

50 / 178 papers shown
Title
EfficientSpeech: An On-Device Text to Speech Model
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
23
4
0
23 May 2023
Continual Learning for End-to-End ASR by Averaging Domain Experts
Continual Learning for End-to-End ASR by Averaging Domain Experts
Peter William VanHarn Plantinga
Jaekwon Yoo
C. Dhir
CLL
MoMe
25
1
0
12 May 2023
Fast Conformer with Linearly Scalable Attention for Efficient Speech
  Recognition
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
Dima Rekesh
Nithin Rao Koluguri
Samuel Kriman
Somshubra Majumdar
Vahid Noroozi
...
Oleksii Hrinchuk
Krishna Puvvada
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
40
81
0
08 May 2023
Leveraging Synthetic Targets for Machine Translation
Leveraging Synthetic Targets for Machine Translation
Sarthak Mittal
Oleksii Hrinchuk
Oleksii Kuchaiev
31
2
0
07 May 2023
Privacy-Preserving In-Context Learning for Large Language Models
Privacy-Preserving In-Context Learning for Large Language Models
Tong Wu
Ashwinee Panda
Jiachen T. Wang
Prateek Mittal
51
29
0
02 May 2023
OLISIA: a Cascade System for Spoken Dialogue State Tracking
OLISIA: a Cascade System for Spoken Dialogue State Tracking
Léo Jacqmin
Lucas Druart
Yannick Esteve
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
20
3
0
20 Apr 2023
Efficient Sequence Transduction by Jointly Predicting Tokens and
  Durations
Efficient Sequence Transduction by Jointly Predicting Tokens and Durations
Hainan Xu
Fei Jia
Somshubra Majumdar
Hengguan Huang
Shinji Watanabe
Boris Ginsburg
27
17
0
13 Apr 2023
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
Brian Yan
Jiatong Shi
Yun Tang
H. Inaguma
Yifan Peng
...
Zhaoheng Ni
Moto Hira
Soumi Maiti
J. Pino
Shinji Watanabe
19
20
0
10 Apr 2023
When Good and Reproducible Results are a Giant with Feet of Clay: The
  Importance of Software Quality in NLP
When Good and Reproducible Results are a Giant with Feet of Clay: The Importance of Software Quality in NLP
Sara Papi
Marco Gaido
Andrea Pilzer
Matteo Negri
51
10
0
28 Mar 2023
Partially Adaptive Multichannel Joint Reduction of Ego-noise and
  Environmental Noise
Partially Adaptive Multichannel Joint Reduction of Ego-noise and Environmental Noise
Hu Fang
Niklas Wittmer
Johannes Twiefel
S. Wermter
Timo Gerkmann
25
3
0
27 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
M. Pantic
27
106
0
25 Mar 2023
Powerful and Extensible WFST Framework for RNN-Transducer Losses
Powerful and Extensible WFST Framework for RNN-Transducer Losses
A. Laptev
Vladimir Bataev
Igor Gitman
Boris Ginsburg
16
3
0
18 Mar 2023
Trustera: A Live Conversation Redaction System
Trustera: A Live Conversation Redaction System
E. Gouvêa
Ali Dadgar
S. Jalalvand
R. Chengalvarayan
Badrinath Jayakumar
Ryan Price
Nicholas Ruiz
Jennifer McGovern
S. Bangalore
Benjamin Stern
12
1
0
16 Mar 2023
Text-only domain adaptation for end-to-end ASR using integrated
  text-to-mel-spectrogram generator
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Vladimir Bataev
Roman Korostik
Evgeny Shabalin
Vitaly Lavrukhin
Boris Ginsburg
VLM
30
14
0
27 Feb 2023
E2E Spoken Entity Extraction for Virtual Agents
E2E Spoken Entity Extraction for Virtual Agents
Karan Singla
Yeon-Jun Kim
S. Bangalore
21
1
0
16 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
21
1
0
12 Feb 2023
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR
  Error Correction
PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction
Zi Xuan Zhang
Zhehui Wang
R. Kamma
S. Eswaran
Narayanan Sadagopan
KELM
23
4
0
10 Feb 2023
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme
  Predictions
Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
Yinghao Aaron Li
Cong Han
Xilin Jiang
N. Mesgarani
17
22
0
20 Jan 2023
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech
  Enhancement and Dereverberation
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
Jean-Marie Lemercier
Julius Richter
Simon Welker
Timo Gerkmann
DiffM
145
77
0
22 Dec 2022
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding
  Tasks
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
Suwon Shon
Siddhant Arora
Chyi-Jiunn Lin
Ankita Pasad
Felix Wu
Roshan S. Sharma
Wei Yu Wu
Hung-yi Lee
Karen Livescu
Shinji Watanabe
ELM
21
32
0
20 Dec 2022
Fast Entropy-Based Methods of Word-Level Confidence Estimation for
  End-To-End Automatic Speech Recognition
Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-To-End Automatic Speech Recognition
A. Laptev
Boris Ginsburg
41
7
0
16 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
49
3,283
0
06 Dec 2022
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data
  Format
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
Qi Zhu
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Baolin Peng
...
Dazhen Wan
Xiaochen Zhu
Jianfeng Gao
Milica Gavsić
Minlie Huang
50
23
0
30 Nov 2022
Neural Transducer Training: Reduced Memory Consumption with Sample-wise
  Computation
Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation
Stefan Braun
Erik McDermott
Roger Hsiao
34
1
0
29 Nov 2022
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at
  Industrial Scale
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
Raphael Tang
K. Kumar
Gefei Yang
Akshat Pandey
Yajie Mao
Vladislav Belyaev
Madhuri Emmadi
Craig Murray
Ferhan Ture
Jimmy J. Lin
19
4
0
21 Nov 2022
Accidental Learners: Spoken Language Identification in Multilingual
  Self-Supervised Models
Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models
Travis M. Bartley
Fei Jia
Krishna C. Puvvada
Samuel Kriman
Boris Ginsburg
SSL
23
6
0
09 Nov 2022
Multi-blank Transducers for Speech Recognition
Multi-blank Transducers for Speech Recognition
Hainan Xu
Fei Jia
Somshubra Majumdar
Shinji Watanabe
Boris Ginsburg
12
10
0
04 Nov 2022
SG-VAD: Stochastic Gates Based Speech Activity Detection
SG-VAD: Stochastic Gates Based Speech Activity Detection
Jonathan Svirsky
Ofir Lindenbaum
31
4
0
28 Oct 2022
A Compact End-to-End Model with Local and Global Context for Spoken
  Language Identification
A Compact End-to-End Model with Local and Global Context for Spoken Language Identification
Fei Jia
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
25
3
0
27 Oct 2022
Disentangled Speech Representation Learning for One-Shot Cross-lingual
  Voice Conversion Using $β$-VAE
Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using βββ-VAE
Hui Lu
Disong Wang
Xixin Wu
Zhiyong Wu
Xunying Liu
Helen M. Meng
DRL
17
9
0
25 Oct 2022
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
ESB: A Benchmark For Multi-Domain End-to-End Speech Recognition
Sanchit Gandhi
Patrick von Platen
Alexander M. Rush
30
27
0
24 Oct 2022
Can we use Common Voice to train a Multi-Speaker TTS system?
Can we use Common Voice to train a Multi-Speaker TTS system?
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
27
10
0
12 Oct 2022
Damage Control During Domain Adaptation for Transducer Based Automatic
  Speech Recognition
Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition
Somshubra Majumdar
Shantanu Acharya
Vitaly Lavrukhin
Boris Ginsburg
21
3
0
06 Oct 2022
Unsupervised domain adaptation for speech recognition with unsupervised
  error correction
Unsupervised domain adaptation for speech recognition with unsupervised error correction
Long Mai
Julie Carson-Berndsen
33
8
0
24 Sep 2022
Improving Small Molecule Generation using Mutual Information Machine
Improving Small Molecule Generation using Mutual Information Machine
Daniel A. Reidenbach
M. Livne
Rajesh Ilango
M. Gill
Johnny Israeli
28
14
0
18 Aug 2022
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech
  Translation
A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation
L. T. Nguyen
Nguyen Luong Tran
Long Doan
Manh Luong
Dat Quoc Nguyen
21
4
0
08 Aug 2022
Face-to-Face Contrastive Learning for Social Intelligence
  Question-Answering
Face-to-Face Contrastive Learning for Social Intelligence Question-Answering
Alex Wilf
Qianli Ma
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
36
10
0
29 Jul 2022
Pronunciation-aware unique character encoding for RNN Transducer-based
  Mandarin speech recognition
Pronunciation-aware unique character encoding for RNN Transducer-based Mandarin speech recognition
Peng Shen
Xugang Lu
Hisashi Kawai
11
2
0
29 Jul 2022
Finstreder: Simple and fast Spoken Language Understanding with Finite
  State Transducers using modern Speech-to-Text models
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Daniel Bermuth
Alexander Poeppel
W. Reif
19
7
0
29 Jun 2022
PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech
  Assistant using Interventional Radiology Workflow Analysis
PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Kubilay Can Demir
M. May
A. Schmid
M. Uder
Katharina Breininger
Tobias Weise
Andreas Maier
Seung Hee Yang
6
4
0
24 Jun 2022
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic
  Speech Recognition
The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Jonathan Mukiibi
Andrew Katumba
J. Nakatumba‐Nabende
Ali Hussein
Josh Meyer
19
7
0
20 Jun 2022
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Sang-gil Lee
Wei Ping
Boris Ginsburg
Bryan Catanzaro
Sung-Hoon Yoon
17
225
0
09 Jun 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech
  with Untranscribed Data
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data
Sungwon Kim
Heeseung Kim
Sung-Hoon Yoon
DiffM
196
52
0
30 May 2022
ASR in German: A Detailed Error Analysis
ASR in German: A Detailed Error Analysis
John M. Wirth
René Peinl
18
5
0
12 Apr 2022
Transducer-based language embedding for spoken language identification
Transducer-based language embedding for spoken language identification
Peng Shen
Xugang Lu
Hisashi Kawai
48
6
0
08 Apr 2022
WavThruVec: Latent speech representation as intermediate features for
  neural speech synthesis
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis
Hubert Siuzdak
Piotr Dura
Pol van Rijn
Nori Jacoby
AI4TS
10
30
0
31 Mar 2022
Generative Spoken Dialogue Language Modeling
Generative Spoken Dialogue Language Modeling
Tu Nguyen
Eugene Kharitonov
Jade Copet
Yossi Adi
Wei-Ning Hsu
...
Paden Tomasello
Robin Algayres
Benoît Sagot
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
27
80
0
30 Mar 2022
Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture
Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture
Karan Singla
S. Jalalvand
Yeon-Jun Kim
Ryan Price
Daniel Pressel
S. Bangalore
10
2
0
29 Mar 2022
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
BEA-Base: A Benchmark for ASR of Spontaneous Hungarian
P. Mihajlik
A. Balog
T. E. Gráczi
A. Kohári
Balázs Tarján
K. Mády
17
8
0
01 Feb 2022
A Study of Transducer based End-to-End ASR with ESPnet: Architecture,
  Auxiliary Loss and Decoding Strategies
A Study of Transducer based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies
Florian Boyer
Yusuke Shinohara
Takaaki Ishii
H. Inaguma
Shinji Watanabe
29
34
0
14 Jan 2022
Previous
1234
Next