Structured State Space Decoder for Speech Recognition and Synthesis

31 October 2022

Papers citing "Structured State Space Decoder for Speech Recognition and Synthesis"

26 / 26 papers shown

Title
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding Yifan Peng Siddharth Dalmia Ian Lane Shinji Watanabe 65 147 0 06 Jul 2022
It's Raw! Audio Generation with State-Space Models Karan Goel Albert Gu Chris Donahue Christopher Ré 55 191 0 20 Feb 2022
Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu Karan Goel Christopher Ré 205 1,773 0 31 Oct 2021
Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers Albert Gu Isys Johnson Karan Goel Khaled Kamal Saab Tri Dao Atri Rudra Christopher Ré 112 595 0 26 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research Tomoki Hayashi Ryuichi Yamamoto Takenori Yoshimura Peter Wu Jiatong Shi Takaaki Saeki Yooncheol Ju Yusuke Yasuda Shinnosuke Takamichi Shinji Watanabe VLM 65 62 0 15 Oct 2021
SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition Jing Pan Tao Lei Kwangyoun Kim Kyu Jeong Han Shinji Watanabe VLM 36 9 0 11 Oct 2021
A Comparative Study on Neural Architectures and Training Methods for Japanese Speech Recognition Shigeki Karita Yotaro Kubo M. Bacchiani Llion Jones 39 13 0 09 Jun 2021
Recent Developments on ESPnet Toolkit Boosted by Conformer Pengcheng Guo Florian Boyer Xuankai Chang Tomoki Hayashi Yosuke Higuchi ... Jing Shi Shinji Watanabe Kun Wei Wangyou Zhang Yuekai Zhang 63 263 0 26 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Jungil Kong Jaehyeon Kim Jaekyoung Bae 177 1,931 0 12 Oct 2020
Big Bird: Transformers for Longer Sequences Manzil Zaheer Guru Guruganesh Kumar Avinava Dubey Joshua Ainslie Chris Alberti ... Philip Pham Anirudh Ravula Qifan Wang Li Yang Amr Ahmed VLM 538 2,081 0 28 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention Angelos Katharopoulos Apoorv Vyas Nikolaos Pappas Franccois Fleuret 198 1,760 0 29 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition Anmol Gulati James Qin Chung-Cheng Chiu Niki Parmar Yu Zhang ... Wei Han Shibo Wang Zhengdong Zhang Yonghui Wu Ruoming Pang 218 3,130 0 16 May 2020
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context Wei Han Zhengdong Zhang Yu Zhang Jiahui Yu Chung-Cheng Chiu James Qin Anmol Gulati Ruoming Pang Yonghui Wu 61 263 0 07 May 2020
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss Qian Zhang Han Lu Hasim Sak Anshuman Tripathi Erik McDermott Stephen Koo Shankar Kumar 74 480 0 07 Feb 2020
A Comparative Study on Transformer vs RNN in Speech Applications Shigeki Karita Nanxin Chen Tomoki Hayashi Takaaki Hori Hirofumi Inaguma ... Ryuichi Yamamoto Xiao-fei Wang Shinji Watanabe Takenori Yoshimura Wangyou Zhang 68 720 0 13 Sep 2019
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS Mutian He Yan Deng Lei He 56 81 0 03 Jun 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Daniel S. Park William Chan Yu Zhang Chung-Cheng Chiu Barret Zoph E. D. Cubuk Quoc V. Le VLM 174 3,456 0 18 Apr 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 216 3,726 0 09 Jan 2019
ESPnet: End-to-End Speech Processing Toolkit Shinji Watanabe Takaaki Hori Shigeki Karita Tomoki Hayashi Jiro Nishitoba ... Jahn Heymann Sanjeev Khudanpur Nanxin Chen Adithya Renduchintala Tsubasa Ochiai VLM 102 1,504 0 30 Mar 2018
Self-Attention with Relative Position Representations Peter Shaw Jakob Uszkoreit Ashish Vaswani 170 2,289 0 06 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 77 2,697 0 16 Dec 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 687 131,526 0 12 Jun 2017
Language Modeling with Gated Convolutional Networks Yann N. Dauphin Angela Fan Michael Auli David Grangier 237 2,396 0 23 Dec 2016
Layer Normalization Jimmy Lei Ba J. Kiros Geoffrey E. Hinton 407 10,481 0 21 Jul 2016
Deep Networks with Stochastic Depth Gao Huang Yu Sun Zhuang Liu Daniel Sedra Kilian Q. Weinberger 209 2,356 0 30 Mar 2016
Listen, Attend and Spell William Chan Navdeep Jaitly Quoc V. Le Oriol Vinyals RALM 153 2,266 0 05 Aug 2015