Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.09577
Cited By
NeMo: a toolkit for building AI applications using Neural Modules
14 September 2019
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
Boris Ginsburg
Samuel Kriman
Stanislav Beliaev
Vitaly Lavrukhin
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NeMo: a toolkit for building AI applications using Neural Modules"
50 / 177 papers shown
Title
NeMo-Inspector: A Visualization Tool for LLM Generation Analysis
Daria Gitman
Igor Gitman
Evelina Bakhturina
SyDa
49
0
0
01 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
Dysarthria Normalization via Local Lie Group Transformations for Robust ASR
Mikhail Osipov
43
0
0
16 Apr 2025
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models
Beilong Tang
Bang Zeng
Ming Li
AI4TS
34
0
0
10 Apr 2025
Visual-Aware Speech Recognition for Noisy Scenarios
Lakshmipathi Balaji
Karan Singla
26
0
0
09 Apr 2025
RNN-Transducer-based Losses for Speech Recognition on Noisy Targets
Vladimir Bataev
30
0
0
09 Apr 2025
OmniScience: A Domain-Specialized LLM for Scientific Reasoning and Discovery
Vignesh Prabhakar
Md Amirul Islam
Adam Atanas
Y. Wang
J. N. Han
...
Rucha Apte
Robert Clark
Kang Xu
Zihan Wang
Kai Liu
LRM
85
1
0
22 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
Contextual Cues in Machine Translation: Investigating the Potential of Multi-Source Input Strategies in LLMs and NMT Systems
Lia Shahnazaryan
P. Simianer
Joern Wuebker
42
0
0
10 Mar 2025
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
C. Xie
Shuo Cai
Wenjun Wang
Pengxiang Li
Zhijie Sang
...
Xiaotian Han
Jianbo Yuan
Shengyu Zhang
Fei Wu
Hongxia Yang
LRM
49
1
0
17 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
106
5
0
28 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
39
0
0
10 Jan 2025
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
44
0
0
10 Jan 2025
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
40
0
0
11 Nov 2024
SimpleFSDP: Simpler Fully Sharded Data Parallel with torch.compile
Ruisi Zhang
Tianyu Liu
Will Feng
Andrew Gu
Sanket Purandare
Wanchao Liang
Francisco Massa
24
1
0
01 Nov 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
54
2
0
23 Oct 2024
DENOASR: Debiasing ASRs through Selective Denoising
Anand Kumar Rai
S. Jaiswal
Shubham Prakash
Bendi Pragnya Sree
Animesh Mukherjee
34
0
0
22 Oct 2024
How much do contextualized representations encode long-range context?
Simeng Sun
Cheng-Ping Hsieh
39
0
0
16 Oct 2024
Upcycling Large Language Models into Mixture of Experts
Ethan He
Abhinav Khattar
R. Prenger
V. Korthikanti
Zijie Yan
Tong Liu
Shiqing Fan
Ashwath Aithal
M. Shoeybi
Bryan Catanzaro
MoE
35
9
0
10 Oct 2024
Transducer Consistency Regularization for Speech to Text Applications
Cindy Tseng
Yun Tang
Vijendra Raj Apsingekar
30
0
0
09 Oct 2024
HAINAN: Fast and Accurate Transducer for Hybrid-Autoregressive ASR
Hainan Xu
Travis M. Bartley
Vladimir Bataev
Boris Ginsburg
141
0
0
03 Oct 2024
Cross-lingual Human-Preference Alignment for Neural Machine Translation with Direct Quality Optimization
Kaden Uhlig
Joern Wuebker
Raphael Reinauer
John DeNero
34
0
0
26 Sep 2024
NanoVoice: Efficient Speaker-Adaptive Text-to-Speech for Multiple Speakers
Nohil Park
Heeseung Kim
Che Hyun Lee
Jooyoung Choi
Jiheum Yeom
Sungroh Yoon
23
2
0
24 Sep 2024
VoiceGuider: Enhancing Out-of-Domain Performance in Parameter-Efficient Speaker-Adaptive Text-to-Speech via Autoguidance
Jiheum Yeom
Heeseung Kim
Jooyoung Choi
Che Hyun Lee
Nohil Park
Sungroh Yoon
26
1
0
24 Sep 2024
EMMeTT: Efficient Multimodal Machine Translation Training
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
28
1
0
20 Sep 2024
Enhancing Synthetic Training Data for Speech Commands: From ASR-Based Filtering to Domain Adaptation in SSL Latent Space
Sebastião Quintas
Isabelle Ferrané
Thomas Pellegrini
28
0
0
19 Sep 2024
Chain-of-Thought Prompting for Speech Translation
Ke Hu
Zhehuai Chen
Chao-Han Huck Yang
Piotr Żelasko
Oleksii Hrinchuk
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
LRM
37
2
0
17 Sep 2024
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Chao-Han Huck Yang
Taejin Park
Yuan Gong
Yuanchao Li
Zhehuai Chen
...
E. Chng
Peter Bell
Catherine Lai
Shinji Watanabe
A. Stolcke
AuLLM
ELM
35
4
0
15 Sep 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
Masao Someki
Kwanghee Choi
Siddhant Arora
William Chen
Samuele Cornell
Jionghao Han
Yifan Peng
Jiatong Shi
Vaibhav Srivastav
Shinji Watanabe
VLM
28
0
0
14 Sep 2024
Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models
Rohit Jena
Ali Taghibakhshi
Sahil Jain
Gerald Shen
Nima Tajbakhsh
Arash Vahdat
38
3
0
09 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna C. Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
31
1
0
02 Sep 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
31
9
0
23 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
39
5
0
13 Jul 2024
Open-Source Conversational AI with SpeechBrain 1.0
Mirco Ravanelli
Titouan Parcollet
Adel Moumen
Sylvain de Langen
Cem Subakan
...
Salima Mdhaffar
G. Laperriere
Mickael Rouvier
Renato De Mori
Yannick Esteve
VLM
34
10
0
29 Jun 2024
BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5
Zhehuai Chen
He Huang
Oleksii Hrinchuk
Krishna C. Puvvada
Nithin Rao Koluguri
Piotr Żelasko
Jagadeesh Balam
Boris Ginsburg
AuLLM
RALM
34
10
0
28 Jun 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLM
AuLLM
33
9
0
27 Jun 2024
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
Yu Nakagome
Michael Hentschel
37
0
0
21 Jun 2024
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models
Vahid Noroozi
Zhehuai Chen
Somshubra Majumdar
Steve Huang
Jagadeesh Balam
Boris Ginsburg
SyDa
34
3
0
18 Jun 2024
Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
Ruchao Fan
Natarajan Balaji Shankar
Abeer Alwan
34
7
0
15 Jun 2024
Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech
Martina Valente
Fabio Brugnara
Giovanni Morrone
Enrico Zovato
Leonardo Badino
25
0
0
13 Jun 2024
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
Kyra Wang
Dorien Herremans
19
0
0
13 Jun 2024
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai
Saurav Muralidharan
Greg Heinrich
Hongxu Yin
Zhangyang Wang
Jan Kautz
Pavlo Molchanov
37
10
0
11 Jun 2024
Label-Looping: Highly Efficient Decoding for Transducers
Vladimir Bataev
Hainan Xu
Daniel Galvez
Vitaly Lavrukhin
Boris Ginsburg
30
4
0
10 Jun 2024
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation
Julius Richter
Yi-Chiao Wu
Steven Krenn
Simon Welker
Bunlong Lay
Shinji Watanabe
Alexander Richard
Timo Gerkmann
34
16
0
10 Jun 2024
Sparse Binarization for Fast Keyword Spotting
Jonathan Svirsky
Uri Shaham
Ofir Lindenbaum
29
0
0
09 Jun 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
32
1
0
21 May 2024
Collage: Light-Weight Low-Precision Strategy for LLM Training
Tao Yu
Gaurav Gupta
Karthick Gopalswamy
Amith R. Mamidala
Hao Zhou
Jeffrey Huynh
Youngsuk Park
Ron Diamant
Anoop Deoras
Jun Huan
MQ
49
3
0
06 May 2024
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Gerald Shen
Zhilin Wang
Olivier Delalleau
Jiaqi Zeng
Yi Dong
...
Sahil Jain
Ali Taghibakhshi
Markel Sanz Ausin
Ashwath Aithal
Oleksii Kuchaiev
33
13
0
02 May 2024
ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging
D. Karkalousos
Ivana Išgum
Henk A. Marquering
M. Caan
29
0
0
30 Apr 2024
ONNXPruner: ONNX-Based General Model Pruning Adapter
Dongdong Ren
Wenbin Li
Tianyu Ding
Lei Wang
Qi Fan
Jing Huo
Hongbing Pan
Yang Gao
29
3
0
10 Apr 2024
1
2
3
4
Next