ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1811.09021
  4. Cited By
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
  Synthesis with Bytes

Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes

22 November 2018
Bo-wen Li
Yu Zhang
Tara N. Sainath
Yonghui Wu
William Chan
    AuLLM
ArXivPDFHTML

Papers citing "Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes"

50 / 72 papers shown
Title
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
On the Cost and Benefits of Training Context with Utterance or Full Conversation Training: A Comparative Stud
Hyouin Liu
Zhikuan Zhang
34
0
0
12 May 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
54
0
0
31 Dec 2024
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual
  Text-to-Speech Synthesis
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Shijia Liao
Yanjie Wang
Tianyu Li
Yifan Cheng
Ruoyi Zhang
Rongzhi Zhou
Yijin Xing
AuLLM
43
10
0
02 Nov 2024
Towards scalable efficient on-device ASR with transfer learning
Towards scalable efficient on-device ASR with transfer learning
Laxmi Pandey
Ke Li
Jinxi Guo
Debjyoti Paul
Arthur Guo
Jay Mahadeokar
Xuedong Zhang
36
2
0
23 Jul 2024
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Sefik Emre Eskimez
Xiaofei Wang
Manthan Thakker
Canrun Li
Chung-Hsien Tsai
...
Min Tang
Xu Tan
Yanqing Liu
Sheng Zhao
Naoyuki Kanda
VLM
35
51
0
26 Jun 2024
Optimizing Byte-level Representation for End-to-end ASR
Optimizing Byte-level Representation for End-to-end ASR
Roger Hsiao
Liuhui Deng
Erik McDermott
R. Travadi
Xiaodan Zhuang
26
0
0
14 Jun 2024
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Meta Learning Text-to-Speech Synthesis in over 7000 Languages
Florian Lux
Sarina Meyer
Lyonel Behringer
Frank Zalkow
P. Do
Matt Coler
Emanuel Habets
Ngoc Thang Vu
CLIP
51
3
0
10 Jun 2024
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision
Saierdaer Yusuyin
Te Ma
Hao Huang
Wenbo Zhao
Zhijian Ou
52
2
0
04 Jun 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by
  Self-Supervised Representation Mixing and Embedding Initialization
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
31
0
0
23 Jan 2024
Streaming Bilingual End-to-End ASR model using Attention over Multiple
  Softmax
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
Aditya Patil
Vikas Joshi
Purvi Agrawal
Rupeshkumar Mehta
11
1
0
22 Jan 2024
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
31
21
0
22 Dec 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
48
24
0
04 Oct 2023
BiSinger: Bilingual Singing Voice Synthesis
BiSinger: Bilingual Singing Voice Synthesis
Huali Zhou
Yueqian Lin
Yao Shi
Peng Sun
Ming Li
25
5
0
25 Sep 2023
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice
  Synthesizer Trained on Monolingual Singers
CrossSinger: A Cross-Lingual Multi-Singer High-Fidelity Singing Voice Synthesizer Trained on Monolingual Singers
Xintong Wang
Chang Zeng
Jun Chen
Chunhui Wang
24
6
0
22 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for
  Text-to-Speech -- A Study between English and Mandarin
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
41
8
0
02 Sep 2023
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Improving Continuous Sign Language Recognition with Cross-Lingual Signs
Fangyun Wei
Yutong Chen
SLR
28
28
0
21 Aug 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
77
301
0
22 May 2023
Language-universal phonetic encoder for low-resource speech recognition
Language-universal phonetic encoder for low-resource speech recognition
Siyuan Feng
Ming Tu
Rui Xia
Chuanzeng Huang
Yuxuan Wang
39
2
0
19 May 2023
Building High-accuracy Multilingual ASR with Gated Language Experts and
  Curriculum Training
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
Eric Sun
Jinyu Li
Yuxuan Hu
Yilun Zhu
Long Zhou
...
Peidong Wang
Linquan Liu
Shujie Liu
Ed Lin
Yifan Gong
31
6
0
01 Mar 2023
CrossSpeech: Speaker-independent Acoustic Representation for
  Cross-lingual Speech Synthesis
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis
Ji-Hoon Kim
Hongying Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
30
8
0
28 Feb 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with
  Unsupervised Text Pretraining
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining
Takaaki Saeki
Soumi Maiti
Xinjian Li
Shinji Watanabe
Shinnosuke Takamichi
Hiroshi Saruwatari
32
17
0
30 Jan 2023
Investigation of Japanese PnG BERT language model in text-to-speech
  synthesis for pitch accent language
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
T. Toda
33
8
0
16 Dec 2022
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Improve Bilingual TTS Using Dynamic Language and Phonology Embedding
Fengyu Yang
Jian Luan
Yujun Wang
21
1
0
07 Dec 2022
Towards Zero-Shot Code-Switched Speech Recognition
Towards Zero-Shot Code-Switched Speech Recognition
Brian Yan
Matthew Wiesner
Ondˇrej Klejch
P. Jyothi
Shinji Watanabe
26
19
0
02 Nov 2022
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised
  Learning for Text-To-Speech
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech
Takaaki Saeki
Heiga Zen
Zhehuai Chen
Nobuyuki Morioka
Gary Wang
Yu Zhang
Ankur Bapna
Andrew Rosenberg
Bhuvana Ramabhadran
69
19
0
27 Oct 2022
Maestro-U: Leveraging joint speech-text representation learning for zero
  supervised speech ASR
Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR
Zhehuai Chen
Ankur Bapna
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Pedro J. Moreno
Nanxin Chen
41
17
0
18 Oct 2022
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot
  Text-To-Speech (TTS)
Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)
Ariadna Sánchez
Alessio Falai
Ziyao Zhang
Orazio Angelini
K. Yanagisawa
38
7
0
04 Jul 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang
Po-Chun Chen
Sung-Feng Huang
Hung-yi Lee
24
1
0
27 Jun 2022
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
LAE: Language-Aware Encoder for Monolingual and Multilingual ASR
Jinchuan Tian
Jianwei Yu
Chunlei Zhang
Chao Weng
Yuexian Zou
Dong Yu
AuLLM
22
25
0
05 Jun 2022
Bilingual End-to-End ASR with Byte-Level Subwords
Bilingual End-to-End ASR with Byte-Level Subwords
Liuhui Deng
Roger Hsiao
Arnab Ghoshal
18
4
0
01 May 2022
vTTS: visual-text to speech
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
13
4
0
28 Mar 2022
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with
  Articulatory Features
Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features
Florian Lux
Ngoc Thang Vu
25
29
0
07 Mar 2022
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme
Jianhao Ye
Hongbin Zhou
Zhiba Su
Wendi He
Kaimeng Ren
Lin Li
Heng Lu
21
4
0
22 Feb 2022
Reducing language context confusion for end-to-end code-switching
  automatic speech recognition
Reducing language context confusion for end-to-end code-switching automatic speech recognition
Shuai Zhang
Jiangyan Yi
Zhengkun Tian
J. Tao
Y. Yeung
Liqun Deng
27
11
0
28 Jan 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker
  Classifier Joint Training
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
J. Yang
Lei He
36
11
0
20 Jan 2022
Cross-lingual Low Resource Speaker Adaptation Using Phonological
  Features
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Georgia Maniati
Nikolaos Ellinas
K. Markopoulos
G. Vamvoukakis
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
6
14
0
17 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Pseudo-Labeling for Massively Multilingual Speech Recognition
Pseudo-Labeling for Massively Multilingual Speech Recognition
Loren Lugosch
Tatiana Likhomanenko
Gabriel Synnaeve
R. Collobert
VLM
13
29
0
30 Oct 2021
Multilingual Speech Recognition using Knowledge Transfer across Learning
  Processes
Multilingual Speech Recognition using Knowledge Transfer across Learning Processes
Rimita Lahiri
K. Kumatani
Eric Sun
Yao Qian
55
6
0
15 Oct 2021
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Haitong Zhang
Yue Lin
15
0
0
14 Oct 2021
Revisiting IPA-based Cross-lingual Text-to-speech
Revisiting IPA-based Cross-lingual Text-to-speech
Haitong Zhang
Haoyue Zhan
Yang Zhang
Xinyuan Yu
Yue Lin
27
6
0
14 Oct 2021
Decoupling recognition and transcription in Mandarin ASR
Decoupling recognition and transcription in Mandarin ASR
Jiahong Yuan
Xingyu Cai
Dongji Gao
Renjie Zheng
Liang Huang
Kenneth Church
38
9
0
02 Aug 2021
Differentiable Allophone Graphs for Language-Universal Speech
  Recognition
Differentiable Allophone Graphs for Language-Universal Speech Recognition
Brian Yan
Siddharth Dalmia
David R. Mortensen
Florian Metze
Shinji Watanabe
19
11
0
24 Jul 2021
Multilingual and crosslingual speech recognition using
  phonological-vector based phone embeddings
Multilingual and crosslingual speech recognition using phonological-vector based phone embeddings
Chengrui Zhu
Keyu An
Huahuan Zheng
Zhijian Ou
17
8
0
11 Jul 2021
Improved Language Identification Through Cross-Lingual Self-Supervised
  Learning
Improved Language Identification Through Cross-Lingual Self-Supervised Learning
Andros Tjandra
Diptanu Gon Choudhury
Frank Zhang
Kritika Singh
Alexis Conneau
Alexei Baevski
Assaf Sela
Yatharth Saraf
Michael Auli
VLM
SSL
24
35
0
08 Jul 2021
Towards One Model to Rule All: Multilingual Strategy for Dialectal
  Code-Switching Arabic ASR
Towards One Model to Rule All: Multilingual Strategy for Dialectal Code-Switching Arabic ASR
Shammur A. Chowdhury
A. Hussein
Ahmed Abdelali
Ahmed M. Ali
19
33
0
31 May 2021
ByT5: Towards a token-free future with pre-trained byte-to-byte models
ByT5: Towards a token-free future with pre-trained byte-to-byte models
Linting Xue
Aditya Barua
Noah Constant
Rami Al-Rfou
Sharan Narang
Mihir Kale
Adam Roberts
Colin Raffel
38
464
0
28 May 2021
Exploiting Adapters for Cross-lingual Low-resource Speech Recognition
Exploiting Adapters for Cross-lingual Low-resource Speech Recognition
Wenxin Hou
Hanlin Zhu
Yidong Wang
Jindong Wang
Tao Qin
Renjun Xu
T. Shinozaki
35
63
0
18 May 2021
Efficient Weight factorization for Multilingual Speech Recognition
Efficient Weight factorization for Multilingual Speech Recognition
Ngoc-Quan Pham
Tuan-Nam Nguyen
S. Stueker
A. Waibel
43
19
0
07 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
12
Next