Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.02014
Cited By
SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition
5 April 2021
Patrick K. O’Neill
Vitaly Lavrukhin
Somshubra Majumdar
Vahid Noroozi
Yuekai Zhang
Oleksii Kuchaiev
Jagadeesh Balam
Yuliya Dovzhenko
Keenan Freyberg
Michael D. Shulman
Boris Ginsburg
Shinji Watanabe
Georg Kucsko
AI4TS
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition"
16 / 16 papers shown
Title
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
26
0
0
30 Apr 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
76
0
0
26 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
87
3
0
26 Feb 2025
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning
Yifan Peng
Krishna C. Puvvada
Zhehuai Chen
Piotr .Zelasko
He Huang
Kunal Dhawan
Ke Hu
Shinji Watanabe
Jagadeesh Balam
Boris Ginsburg
58
2
0
23 Oct 2024
Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation
Nithin Rao Koluguri
Travis M. Bartley
Hainan Xu
Oleksii Hrinchuk
Jagadeesh Balam
Boris Ginsburg
Georg Kucsko
37
2
0
09 Sep 2024
Improving Neural Biasing for Contextual Speech Recognition by Early Context Injection and Text Perturbation
Ruizhe Huang
M. Yarmohammadi
Sanjeev Khudanpur
Dan Povey
35
2
0
14 Jul 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
36
2
0
04 Jun 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification
Yifan Peng
Yui Sudo
Muhammad Shakeel
Shinji Watanabe
VLM
37
17
0
20 Feb 2024
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
29
2
0
07 Jun 2023
ASR in German: A Detailed Error Analysis
John M. Wirth
René Peinl
18
5
0
12 Apr 2022
Earnings-22: A Practical Benchmark for Accents in the Wild
Miguel Rio
Peter Ha
Quinten McNamara
Corey Miller
Shipra Chandra
14
23
0
29 Mar 2022
Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models
Vrunda N. Sukhadia
S. Umesh
28
8
0
18 Feb 2022
Lhotse: a speech data representation library for the modern deep learning ecosystem
Willem Hagemann
Daniel Povey
Jan "Yenda" Trmal
Sanjeev Khudanpur
AuLLM
AI4TS
27
31
0
25 Oct 2021
Scalable Data Annotation Pipeline for High-Quality Large Speech Datasets Development
Mingkuan Liu
Chi Zhang
Hua Xing
C. Feng
Mon-Chu Chen
Judith Bishop
Grace Ngapo
22
3
0
01 Sep 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
48
349
0
13 Jun 2021
NeMo: a toolkit for building AI applications using Neural Modules
Oleksii Kuchaiev
Jason Chun Lok Li
Huyen Nguyen
Oleksii Hrinchuk
Ryan Leary
...
Jack Cook
P. Castonguay
Mariya Popova
Jocelyn Huang
Jonathan M. Cohen
211
292
0
14 Sep 2019
1