ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.09577
  4. Cited By
NeMo: a toolkit for building AI applications using Neural Modules

NeMo: a toolkit for building AI applications using Neural Modules

14 September 2019
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
Boris Ginsburg
Samuel Kriman
Stanislav Beliaev
Vitaly Lavrukhin
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
ArXivPDFHTML

Papers citing "NeMo: a toolkit for building AI applications using Neural Modules"

50 / 178 papers shown
Title
ONNXPruner: ONNX-Based General Model Pruning Adapter
ONNXPruner: ONNX-Based General Model Pruning Adapter
Dongdong Ren
Wenbin Li
Tianyu Ding
Lei Wang
Qi Fan
Jing Huo
Hongbing Pan
Yang Gao
36
3
0
10 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
Transducers with Pronunciation-aware Embeddings for Automatic Speech
  Recognition
Transducers with Pronunciation-aware Embeddings for Automatic Speech Recognition
Hainan Xu
Zhehuai Chen
Fei Jia
Boris Ginsburg
33
0
0
04 Apr 2024
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration
  Transducer
TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer
Yu Xi
Hao Li
Baochen Yang
Haoyu Li
Hai-kun Xu
Kai Yu
35
1
0
20 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
42
8
0
14 Mar 2024
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech
  Recognition Evaluation
SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation
Jiayu Du
Jinpeng Li
Guoguo Chen
Wei-Qiang Zhang
ELM
37
3
0
13 Mar 2024
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from
  Multimodal Body Movement Signals in the Wild
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Jose Vargas-Quiros
Chirag Raman
Stephanie Tan
Ekin Gedik
Laura Cabrera-Quiros
Hayley Hung
26
3
0
02 Mar 2024
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset
Shubham Toshniwal
Ivan Moshkov
Sean Narenthiran
Daria Gitman
Fei Jia
Igor Gitman
28
76
0
15 Feb 2024
Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical
  System for Punctuation Restoration
Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration
Xiliang Zhu
Chia-Tien Chang
Shayna Gardiner
David Rossouw
Jonas Robertson
27
1
0
05 Feb 2024
Exploring the limits of decoder-only models trained on public speech
  recognition corpora
Exploring the limits of decoder-only models trained on public speech recognition corpora
Ankit Gupta
G. Saon
Brian Kingsbury
OffRL
23
5
0
31 Jan 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
76
21
0
30 Jan 2024
Benchmarking Large Multimodal Models against Common Corruptions
Benchmarking Large Multimodal Models against Common Corruptions
Jiawei Zhang
Tianyu Pang
Chao Du
Yi Ren
Bo-wen Li
Min-Bin Lin
MLLM
30
14
0
22 Jan 2024
Keep Decoding Parallel with Effective Knowledge Distillation from
  Language Models to End-to-end Speech Recognisers
Keep Decoding Parallel with Effective Knowledge Distillation from Language Models to End-to-end Speech Recognisers
Michael Hentschel
Yuta Nishikawa
Tatsuya Komatsu
Yusuke Fujita
11
4
0
22 Jan 2024
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant
  Meeting Transcription
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Alon Vinnikov
Amir Ivry
Aviv Hurvitz
Igor Abramovski
S. Koubi
...
S. Sivasankaran
Yifan Gong
Min Tang
Huaming Wang
Eyal Krupka
31
20
0
16 Jan 2024
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic
  Speech Recognition
Task Oriented Dialogue as a Catalyst for Self-Supervised Automatic Speech Recognition
David M. Chan
Shalini Ghosh
Hitesh Tulsiani
Ariya Rastrow
Björn Hoffmeister
28
1
0
04 Jan 2024
Stateful Conformer with Cache-based Inference for Streaming Automatic
  Speech Recognition
Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition
Vahid Noroozi
Somshubra Majumdar
Ankur Kumar
Jagadeesh Balam
Boris Ginsburg
23
10
0
27 Dec 2023
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech
  Representations of Unlabeled Data
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
18
4
0
15 Dec 2023
Uncertainty-aware Language Modeling for Selective Question Answering
Uncertainty-aware Language Modeling for Selective Question Answering
Qi Yang
Shreya Ravikumar
F. Schmitt-Ulms
S. Lolla
Ege Demir
...
Sadhana Lolla
Elaheh Ahmadi
Daniela Rus
Alexander Amini
Alejandro Perez
16
7
0
26 Nov 2023
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and
  LAnguage in Conversational Environments
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Somil Jain
Pratik Roy Chowdhuri
Prachi Singh
Deepu Vijayasenan
Sriram Ganapathy
19
6
0
21 Nov 2023
nach0: Multimodal Natural and Chemical Languages Foundation Model
nach0: Multimodal Natural and Chemical Languages Foundation Model
M. Livne
Z. Miftahutdinov
E. Tutubalina
Maksim Kuznetsov
Daniil Polykovskiy
...
Aastha Jhunjhunwala
Anthony Costa
Alex Aliper
Alán Aspuru-Guzik
Alex Zhavoronkov
AI4CE
24
12
0
21 Nov 2023
Secure Transformer Inference Protocol
Secure Transformer Inference Protocol
Mu Yuan
Lan Zhang
Xiang-Yang Li
30
3
0
14 Nov 2023
Are cascade dialogue state tracking models speaking out of turn in
  spoken dialogues?
Are cascade dialogue state tracking models speaking out of turn in spoken dialogues?
Lucas Druart
Léo Jacqmin
Benoit Favre
L. Rojas-Barahona
Valentin Vielzeuf
19
0
0
03 Nov 2023
ChipNeMo: Domain-Adapted LLMs for Chip Design
ChipNeMo: Domain-Adapted LLMs for Chip Design
Mingjie Liu
Teodor-Dumitru Ene
Robert M. Kirby
Chris Cheng
N. Pinckney
...
Pratik P Suthar
Varun Tej
Walker J. Turner
Kaizhe Xu
Haoxin Ren
42
141
0
31 Oct 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio
  Forensics
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Valerio Francesco Puglisi
O. Giudice
S. Battiato
17
1
0
29 Oct 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning,
  and audio processing components for PyTorch
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
46
22
0
27 Oct 2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's
  DASR System
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
T. Park
He Huang
Ante Jukić
Kunal Dhawan
Krishna C. Puvvada
Nithin Rao Koluguri
Nikolay Karpov
A. Laptev
Jagadeesh Balam
Boris Ginsburg
27
6
0
18 Oct 2023
Zipformer: A faster and better encoder for automatic speech recognition
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao
Liyong Guo
Xiaoyu Yang
Wei Kang
Fangjun Kuang
Yifan Yang
Zengrui Jin
Long Lin
Daniel Povey
VLM
25
65
0
17 Oct 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
21
49
0
13 Oct 2023
End-to-end Online Speaker Diarization with Target Speaker Tracking
End-to-end Online Speaker Diarization with Target Speaker Tracking
Weiqing Wang
Ming Li
31
5
0
12 Oct 2023
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to
  RLHF
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF
Yi Dong
Zhilin Wang
Makesh Narsimhan Sreedhar
Xianchao Wu
Oleksii Kuchaiev
ALM
LLMSV
34
64
0
09 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
28
16
0
02 Oct 2023
FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language
  Models
FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models
Jingwei Sun
Ziyue Xu
Hongxu Yin
Dong Yang
Daguang Xu
Yiran Chen
Holger R. Roth
VLM
95
23
0
02 Oct 2023
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR
  Customization
Wiki-En-ASR-Adapt: Large-scale synthetic dataset for English ASR Customization
Alexandra Antonova
33
0
0
29 Sep 2023
Neural Machine Translation Models Can Learn to be Few-shot Learners
Neural Machine Translation Models Can Learn to be Few-shot Learners
Raphael Reinauer
P. Simianer
Kaden Uhlig
Johannes E. M. Mosig
Joern Wuebker
LRM
21
8
0
15 Sep 2023
Enhancing Speaker Diarization with Large Language Models: A Contextual
  Beam Search Approach
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
T. Park
Kunal Dhawan
Nithin Rao Koluguri
Jagadeesh Balam
34
15
0
11 Sep 2023
KinSPEAK: Improving speech recognition for Kinyarwanda via
  semi-supervised learning methods
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
Antoine Nzeyimana
SSL
22
0
0
23 Aug 2023
N-gram Boosting: Improving Contextual Biasing with Normalized N-gram
  Targets
N-gram Boosting: Improving Contextual Biasing with Normalized N-gram Targets
Wang Yau Li
Shreekantha Nadig
K. Chang
Zafarullah Mahmood
Riqiang Wang
Simon Vandieken
Jonas Robertson
Frederic Mailhot
22
0
0
04 Aug 2023
Enhancing conversational quality in language learning chatbots: An
  evaluation of GPT4 for ASR error correction
Enhancing conversational quality in language learning chatbots: An evaluation of GPT4 for ASR error correction
Long Mai
Julie Carson-Berndsen
11
4
0
19 Jul 2023
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Pretraining Conformer with ASR or ASV for Anti-Spoofing Countermeasure
Yikang Wang
Hiromitsu Nishizaki
Ming Li
37
0
0
04 Jul 2023
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and
  Few-shot Agents
X-RiSAWOZ: High-Quality End-to-End Multilingual Dialogue Datasets and Few-shot Agents
M. Moradshahi
Tianhao Shen
Kalika Bali
Monojit Choudhury
Gaël de Chalendar
...
Michael Sun
Aditya Yadavalli
Chaobin You
Deyi Xiong
M. Lam
16
8
0
30 Jun 2023
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data
Heeseung Kim
Sungwon Kim
Ji-Ran Yeom
Sung-Wan Yoon
DiffM
21
21
0
28 Jun 2023
Confidence-based Ensembles of End-to-End Speech Recognition Models
Confidence-based Ensembles of End-to-End Speech Recognition Models
Igor Gitman
Vitaly Lavrukhin
A. Laptev
Boris Ginsburg
UQCV
25
7
0
27 Jun 2023
EM-Network: Oracle Guided Self-distillation for Sequence Learning
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
30
2
0
14 Jun 2023
Audio-Visual Speech Enhancement with Score-Based Generative Models
Audio-Visual Speech Enhancement with Score-Based Generative Models
Julius Richter
Simone Frintrop
Timo Gerkmann
DiffM
18
10
0
02 Jun 2023
Spoken Language Identification System for English-Mandarin
  Code-Switching Child-Directed Speech
Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech
Shashi Kant Gupta
Sushant Hiray
Prashant Kukde
25
3
0
01 Jun 2023
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations
  for Text-to-Speech
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
L. T. Nguyen
Thinh-Le-Gia Pham
Dat Quoc Nguyen
24
13
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code
  Collaborated with Mixer
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
25
0
0
31 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of
  Speech in Glow-TTS
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
27
4
0
28 May 2023
Robustness of Multi-Source MT to Transcription Errors
Robustness of Multi-Source MT to Transcription Errors
Dominik Machávcek
Peter Polák
Ondrej Bojar
Raj Dabre
28
4
0
26 May 2023
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR
Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR
Kaushal Bhogale
Sairam Sundaresan
A. Raman
Tahir Javed
Mitesh M. Khapra
Pratyush Kumar
VLM
25
10
0
24 May 2023
Previous
1234
Next