Deep Speech: Scaling up end-to-end speech recognition

17 December 2014

Papers citing "Deep Speech: Scaling up end-to-end speech recognition"

50 / 750 papers shown

Title
Phoneme-Based Proactive Anti-Eavesdropping with Controlled Recording Privilege Peng Huang Yao Wei Peng Cheng Zhongjie Ba Liwang Lu Feng Lin Yang Wang Kui Ren 26 0 0 28 Jan 2024
NeRF-AD: Neural Radiance Field with Attention-based Disentanglement for Talking Face Synthesis Chongke Bi Xiaoxing Liu Zhilei Liu DiffM CVBM 29 4 0 23 Jan 2024
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model Dongdi Zhao Jianbo Ma Lu Lu Jinke Li Xuan Ji Lei Zhu Fuming Fang Ming-Yu Liu Feijun Jiang 15 1 0 05 Jan 2024
PhasePerturbation: Speech Data Augmentation via Phase Perturbation for Automatic Speech Recognition Chengxi Lei Satwinder Singh Feng Hou Xiaoyun Jia Ruili Wang 25 1 0 13 Dec 2023
USM-Lite: Quantization and Sparsity Aware Fine-tuning for Speech Recognition with Universal Speech Models Shaojin Ding David Qiu David Rim Yanzhang He Oleg Rybakov ... Tara N. Sainath Zhonglin Han Jian Li Amir Yazdanbakhsh Shivani Agrawal MQ 31 9 0 13 Dec 2023
Relational Deep Learning: Graph Representation Learning on Relational Databases Matthias Fey Weihua Hu Kexin Huang J. E. Lenssen Rishabh Ranjan Joshua Robinson Rex Ying Jiaxuan You J. Leskovec GNN 42 30 0 07 Dec 2023
MyPortrait: Morphable Prior-Guided Personalized Portrait Generation Bo Ding Zhenfeng Fan Shuang Yang Shihong Xia 71 2 0 05 Dec 2023
3DiFACE: Diffusion-based Speech-driven 3D Facial Animation and Editing Balamurugan Thambiraja S. Aliakbarian Darren Cosker Justus Thies DiffM VGen 45 11 0 01 Dec 2023
MemoryCompanion: A Smart Healthcare Solution to Empower Efficient Alzheimer's Care Via Unleashing Generative AI Lifei Zheng Yeonie Heo Yi Fang AI4MH 22 0 0 20 Nov 2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding Jianzong Wang Yimin Deng Ziqi Liang Xulong Zhang Ning Cheng Jing Xiao CVBM 18 2 0 15 Nov 2023
Automatic Disfluency Detection from Untranscribed Speech Amrit Romana K. Koishida E. Provost 44 6 0 01 Nov 2023
Form follows Function: Text-to-Text Conditional Graph Generation based on Functional Requirements Peter Zachares Vahan Hovhannisyan Alan Mosca Yarin Gal 29 1 0 01 Nov 2023
Deep Audio Analyzer: a Framework to Industrialize the Research on Audio Forensics Valerio Francesco Puglisi O. Giudice Sebastiano Battiato 25 1 0 29 Oct 2023
Personalized Speech-driven Expressive 3D Facial Animation Synthesis with Style Control Elif Bozkurt 36 0 0 25 Oct 2023
LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding Qu Yang Malu Zhang Jibin Wu Kay Chen Tan Haizhou Li 29 9 0 23 Oct 2023
No Pitch Left Behind: Addressing Gender Unbalance in Automatic Speech Recognition through Pitch Manipulation Dennis Fucci Marco Gaido Matteo Negri Mauro Cettolo L. Bentivogli 28 5 0 10 Oct 2023
DiffPoseTalk: Speech-Driven Stylistic 3D Facial Animation and Head Pose Generation via Diffusion Models Zhiyao Sun Tian Lv Sheng Ye Matthieu Lin Jenny Sheng Yuhui Wen Minjing Yu Yong-jin Liu DiffM 36 45 0 30 Sep 2023
Emotional Listener Portrait: Neural Listener Head Generation with Emotion Luchuan Song Guojun Yin Zhenchao Jin Xiaoyi Dong Chenliang Xu 27 10 0 29 Sep 2023
Developing automatic verbatim transcripts for international multilingual meetings: an end-to-end solution Akshat Dewan Michal Ziemski Henri Meylan Lorenzo Concina Bruno Pouliquen 11 1 0 27 Sep 2023
Privacy-preserving and Privacy-attacking Approaches for Speech and Audio -- A Survey Yuchen Liu Apu Kapadia Donald Williamson AAML 38 0 0 26 Sep 2023
Deepfake audio as a data augmentation technique for training automatic speech to text transcription models Alexandre R. Ferreira Cláudio E. C. Campelo 8 1 0 22 Sep 2023
A Multiscale Autoencoder (MSAE) Framework for End-to-End Neural Network Speech Enhancement Bengt J. Borgström M. Brandstein 18 2 0 21 Sep 2023
AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition Mohamad Fakih R. Kanj Fadi J. Kurdahi M. Fouda AAML 16 0 0 20 Sep 2023
FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion Stefan Stan Kazi Injamamul Haque Zerrin Yumak DiffM 31 54 0 20 Sep 2023
Uncertainty Estimation in Instance Segmentation with Star-convex Shapes Qasim M. K. Siddiqui Sebastian Starke Peter Steinbach UQCV 22 0 0 19 Sep 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation E. Tsunoo Hayato Futami Yosuke Kashiwagi Siddhant Arora Shinji Watanabe VLM AuLLM RALM 40 9 0 16 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper Jeong Hun Yeo Minsu Kim Shinji Watanabe Y. Ro VLM 28 12 0 15 Sep 2023
PhantomSound: Black-Box, Query-Efficient Audio Adversarial Attack via Split-Second Phoneme Injection Hanqing Guo Guangjing Wang Yuanda Wang Bocheng Chen Qiben Yan Li Xiao AAML 37 9 0 13 Sep 2023
DAD++: Improved Data-free Test Time Adversarial Defense Gaurav Kumar Nayak Inder Khatri Shubham Randive Ruchit Rawal Anirban Chakraborty AAML 23 1 0 10 Sep 2023
Audio-Driven Dubbing for User Generated Contents via Style-Aware Semi-Parametric Synthesis Linsen Song Wayne Wu Chaoyou Fu Chen Change Loy Ran He 31 10 0 31 Aug 2023
ASTER: Automatic Speech Recognition System Accessibility Testing for Stutterers Yi Liu Yuekang Li Gelei Deng Felix Juefei Xu Yao Du Cen Zhang Chengwei Liu Yeting Li L. Ma Yang Liu 24 3 0 30 Aug 2023
Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks Shu Wang Kun Sun Qi Li AAML 28 0 0 18 Aug 2023
VAST: Vivify Your Talking Avatar via Zero-Shot Expressive Facial Style Transfer Liyang Chen Zhiyong Wu Runnan Li Weihong Bao Jun Ling Xuejiao Tan Sheng Zhao 26 5 0 09 Aug 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition E. Tsunoo Hayato Futami Yosuke Kashiwagi Siddhant Arora Shinji Watanabe 30 4 0 24 Jul 2023
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos Anand Kumar Rai Siddharth D. Jaiswal Animesh Mukherjee 17 1 0 20 Jul 2023
Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis Jiahe Li Jiawei Zhang Xiao Bai Jun Zhou L. Gu 3DH 26 62 0 18 Jul 2023
SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark Jun Niu Xiaoyan Zhu Moxuan Zeng Ge Zhang Qingyang Zhao ... Peng Liu Yulong Shen Xiaohong Jiang Jianfeng Ma Yuqing Zhang 47 3 0 12 Jul 2023
Encoder-Decoder Networks for Self-Supervised Pretraining and Downstream Signal Bandwidth Regression on Digital Antenna Arrays R. Bhattacharjea Nathan E. West SSL 15 1 0 06 Jul 2023
Align With Purpose: Optimize Desired Properties in CTC Models with a General Plug-and-Play Framework Eliya Segev Maya Alroy Ronen Katsir Noam Wies Ayana Shenhav ... D. Zar Oren Tadmor Jacob Bitterman Amnon Shashua Tal Rosenwein 32 2 0 04 Jul 2023
Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning Hong Joo Lee Yonghyun Ro AAML 28 3 0 27 Jun 2023
Scaling and Resizing Symmetry in Feedforward Networks Carlos Cardona 4 2 0 26 Jun 2023
MobileASR: A resource-aware on-device learning framework for user voice personalization applications on mobile phones Zitha Sasindran Harsha Yelchuri Pooja S B. Rao Prabhakar Venkata Tamma 15 1 0 15 Jun 2023
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition Muhammad Umar Farooq Thomas Hain 14 2 0 14 Jun 2023
Get More for Less in Decentralized Learning Systems Akash Dhasade Anne-Marie Kermarrec Rafael Pires Rishi Sharma Milos Vujasinovic Jeffrey Wigger 26 7 0 07 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer Lu Huang Yangqiu Song Jun Zhang Lu Lu Zejun Ma 29 2 0 07 Jun 2023
Looking and Listening: Audio Guided Text Recognition Wenwen Yu Mingyu Liu Biao Yang Enming Zhang Deqiang Jiang Xing Sun Yuliang Liu Xiang Bai DiffM 25 1 0 06 Jun 2023
Using Sequences of Life-events to Predict Human Lives Germans Savcisens Tina Eliassi-Rad L. K. Hansen L. Mortensen Lau Lilleholt Anna Rogers Ingo Zettler Sune Lehmann AI4TS 39 36 0 05 Jun 2023
DVFO: Learning-Based DVFS for Energy-Efficient Edge-Cloud Collaborative Inference Ziyang Zhang Yang Zhao Huan Li Changyao Lin Jie Liu 38 13 0 02 Jun 2023
Encoder-decoder multimodal speaker change detection Jee-weon Jung Soonshin Seo Hee-Soo Heo Geon-min Kim You Jin Kim Youngki Kwon Min-Ji Lee Bong-Jin Lee 37 2 0 01 Jun 2023
Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication Emin Cagatay Nakilcioglu M. Reimann O. John 14 2 0 01 Jun 2023