ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1412.5567
  4. Cited By
Deep Speech: Scaling up end-to-end speech recognition

Deep Speech: Scaling up end-to-end speech recognition

17 December 2014
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
Erich Elsen
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
ArXivPDFHTML

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 750 papers shown
Title
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording
  Privilege
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege
Peng Huang
Yao Wei
Peng Cheng
Zhongjie Ba
Liwang Lu
Feng Lin
Yang Wang
Kui Ren
26
0
0
28 Jan 2024
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for
  Talking Face Synthesis
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis
Chongke Bi
Xiaoxing Liu
Zhilei Liu
DiffM
CVBM
29
4
0
23 Jan 2024
A unified multichannel far-field speech recognition system: combining
  neural beamforming with attention based end-to-end model
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
Dongdi Zhao
Jianbo Ma
Lu Lu
Jinke Li
Xuan Ji
Lei Zhu
Fuming Fang
Ming-Yu Liu
Feijun Jiang
15
1
0
05 Jan 2024
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for
  Automatic Speech Recognition
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition
Chengxi Lei
Satwinder Singh
Feng Hou
Xiaoyun Jia
Ruili Wang
25
1
0
13 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech
  Recognition with Universal Speech Models
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models
Shaojin Ding
David Qiu
David Rim
Yanzhang He
Oleg Rybakov
...
Tara N. Sainath
Zhonglin Han
Jian Li
Amir Yazdanbakhsh
Shivani Agrawal
MQ
31
9
0
13 Dec 2023
Relational Deep Learning: Graph Representation Learning on Relational
  Databases
Relational Deep Learning: Graph Representation Learning on Relational Databases
Matthias Fey
Weihua Hu
Kexin Huang
J. E. Lenssen
Rishabh Ranjan
Joshua Robinson
Rex Ying
Jiaxuan You
J. Leskovec
GNN
42
30
0
07 Dec 2023
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation
Bo Ding
Zhenfeng Fan
Shuang Yang
Shihong Xia
71
2
0
05 Dec 2023
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing
Balamurugan Thambiraja
S. Aliakbarian
Darren Cosker
Justus Thies
DiffM
VGen
45
11
0
01 Dec 2023
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient
  Alzheimer's Care Via Unleashing Generative AI
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI
Lifei Zheng
Yeonie Heo
Yi Fang
AI4MH
22
0
0
20 Nov 2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
  Embedding
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
Jianzong Wang
Yimin Deng
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
CVBM
18
2
0
15 Nov 2023
Automatic Disfluency Detection from Untranscribed Speech
Automatic Disfluency Detection from Untranscribed Speech
Amrit Romana
K. Koishida
E. Provost
44
6
0
01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based
  on Functional Requirements
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements
Peter Zachares
Vahan Hovhannisyan
Alan Mosca
Yarin Gal
29
1
0
01 Nov 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio
  Forensics
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics
Valerio Francesco Puglisi
O. Giudice
Sebastiano Battiato
25
1
0
29 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with
  Style Control
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control
Elif Bozkurt
36
0
0
25 Oct 2023
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks
  with TTFS Coding
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding
Qu Yang
Malu Zhang
Jibin Wu
Kay Chen Tan
Haizhou Li
29
9
0
23 Oct 2023
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech
  Recognition through Pitch Manipulation
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation
Dennis Fucci
Marco Gaido
Matteo Negri
Mauro Cettolo
L. Bentivogli
28
5
0
10 Oct 2023
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose
  Generation via Diffusion Models
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models
Zhiyao Sun
Tian Lv
Sheng Ye
Matthieu Lin
Jenny Sheng
Yuhui Wen
Minjing Yu
Yong-jin Liu
DiffM
36
45
0
30 Sep 2023
Emotional Listener Portrait: Neural Listener Head Generation with
  Emotion
Emotional Listener Portrait: Neural Listener Head Generation with Emotion
Luchuan Song
Guojun Yin
Zhenchao Jin
Xiaoyi Dong
Chenliang Xu
27
10
0
29 Sep 2023
Developing automatic verbatim transcripts for international multilingual
  meetings: an end-to-end solution
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution
Akshat Dewan
Michal Ziemski
Henri Meylan
Lorenzo Concina
Bruno Pouliquen
11
1
0
27 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio
  -- A Survey
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey
Yuchen Liu
Apu Kapadia
Donald Williamson
AAML
38
0
0
26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic
  speech to text transcription models
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models
Alexandre R. Ferreira
Cláudio E. C. Campelo
8
1
0
22 Sep 2023
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network
  Speech Enhancement
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement
Bengt J. Borgström
M. Brandstein
18
2
0
21 Sep 2023
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack
  on Speech Recognition
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition
Mohamad Fakih
R. Kanj
Fadi J. Kurdahi
M. Fouda
AAML
16
0
0
20 Sep 2023
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using
  Diffusion
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
Stefan Stan
Kazi Injamamul Haque
Zerrin Yumak
DiffM
31
54
0
20 Sep 2023
Uncertainty Estimation in Instance Segmentation with Star-convex Shapes
Uncertainty Estimation in Instance Segmentation with Star-convex Shapes
Qasim M. K. Siddiqui
Sebastian Starke
Peter Steinbach
UQCV
22
0
0
19 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and
  Text Data Augmentation
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
VLM
AuLLM
RALM
40
9
0
16 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
28
12
0
15 Sep 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via
  Split-Second Phoneme Injection
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection
Hanqing Guo
Guangjing Wang
Yuanda Wang
Bocheng Chen
Qiben Yan
Li Xiao
AAML
37
9
0
13 Sep 2023
DAD++: Improved Data-free Test Time Adversarial Defense
DAD++: Improved Data-free Test Time Adversarial Defense
Gaurav Kumar Nayak
Inder Khatri
Shubham Randive
Ruchit Rawal
Anirban Chakraborty
AAML
23
1
0
10 Sep 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware
  Semi-Parametric Synthesis
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis
Linsen Song
Wayne Wu
Chaoyou Fu
Chen Change Loy
Ran He
31
10
0
31 Aug 2023
ASTER: Automatic Speech Recognition System Accessibility Testing for
  Stutterers
ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers
Yi Liu
Yuekang Li
Gelei Deng
Felix Juefei Xu
Yao Du
Cen Zhang
Chengwei Liu
Yeting Li
L. Ma
Yang Liu
24
3
0
30 Aug 2023
Compensating Removed Frequency Components: Thwarting Voice Spectrum
  Reduction Attacks
Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks
Shu Wang
Kun Sun
Qi Li
AAML
28
0
0
18 Aug 2023
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style
  Transfer
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer
Liyang Chen
Zhiyong Wu
Runnan Li
Weihong Bao
Jun Ling
Xuejiao Tan
Sheng Zhao
26
5
0
09 Aug 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming
  Encoder-decoder Speech Recognition
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition
E. Tsunoo
Hayato Futami
Yosuke Kashiwagi
Siddhant Arora
Shinji Watanabe
30
4
0
24 Jul 2023
A Deep Dive into the Disparity of Word Error Rates Across Thousands of
  NPTEL MOOC Videos
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Anand Kumar Rai
Siddharth D. Jaiswal
Animesh Mukherjee
17
1
0
20 Jul 2023
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking
  Portrait Synthesis
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
Jiahe Li
Jiawei Zhang
Xiao Bai
Jun Zhou
L. Gu
3DH
26
62
0
18 Jul 2023
SoK: Comparing Different Membership Inference Attacks with a
  Comprehensive Benchmark
SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark
Jun Niu
Xiaoyan Zhu
Moxuan Zeng
Ge Zhang
Qingyang Zhao
...
Peng Liu
Yulong Shen
Xiaohong Jiang
Jianfeng Ma
Yuqing Zhang
47
3
0
12 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream
  Signal Bandwidth Regression on Digital Antenna Arrays
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays
R. Bhattacharjea
Nathan E. West
SSL
15
1
0
06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a
  General Plug-and-Play Framework
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework
Eliya Segev
Maya Alroy
Ronen Katsir
Noam Wies
Ayana Shenhav
...
D. Zar
Oren Tadmor
Jacob Bitterman
Amnon Shashua
Tal Rosenwein
32
2
0
04 Jul 2023
Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning
Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning
Hong Joo Lee
Yonghyun Ro
AAML
28
3
0
27 Jun 2023
Scaling and Resizing Symmetry in Feedforward Networks
Scaling and Resizing Symmetry in Feedforward Networks
Carlos Cardona
4
2
0
26 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice
  personalization applications on mobile phones
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones
Zitha Sasindran
Harsha Yelchuri
Pooja S B. Rao
Prabhakar Venkata Tamma
15
1
0
15 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve
  Low-Resource Speech Recognition
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
Muhammad Umar Farooq
Thomas Hain
14
2
0
14 Jun 2023
Get More for Less in Decentralized Learning Systems
Get More for Less in Decentralized Learning Systems
Akash Dhasade
Anne-Marie Kermarrec
Rafael Pires
Rishi Sharma
Milos Vujasinovic
Jeffrey Wigger
26
7
0
07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
29
2
0
07 Jun 2023
Looking and Listening: Audio Guided Text Recognition
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
25
1
0
06 Jun 2023
Using Sequences of Life-events to Predict Human Lives
Using Sequences of Life-events to Predict Human Lives
Germans Savcisens
Tina Eliassi-Rad
L. K. Hansen
L. Mortensen
Lau Lilleholt
Anna Rogers
Ingo Zettler
Sune Lehmann
AI4TS
39
36
0
05 Jun 2023
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative
  Inference
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference
Ziyang Zhang
Yang Zhao
Huan Li
Changyao Lin
Jie Liu
38
13
0
02 Jun 2023
Encoder-decoder multimodal speaker change detection
Encoder-decoder multimodal speaker change detection
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
37
2
0
01 Jun 2023
Adaptation and Optimization of Automatic Speech Recognition (ASR) for
  the Maritime Domain in the Field of VHF Communication
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Emin Cagatay Nakilcioglu
M. Reimann
O. John
14
2
0
01 Jun 2023
Previous
12345...131415
Next