Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1711.00354
Cited By
JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis
28 October 2017
Ryosuke Sonobe
Shinnosuke Takamichi
Hiroshi Saruwatari
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis"
50 / 73 papers shown
Title
From Sharpness to Better Generalization for Speech Deepfake Detection
Wen-Chin Huang
Xuechen Liu
Xin Eric Wang
Junichi Yamagishi
Yanmin Qian
20
0
0
13 Jun 2025
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation
Rui Hu
Xiaolong Lin
Jiawang Liu
Shixi Huang
Zhenpeng Zhan
14
0
0
09 Jun 2025
Comparative Analysis of Fast and High-Fidelity Neural Vocoders for Low-Latency Streaming Synthesis in Resource-Constrained Environments
Reo Yoneyama
Masaya Kawamura
Ryo Terashima
Ryuichi Yamamoto
Tomoki Toda
123
0
0
04 Jun 2025
WCTC-Biasing: Retraining-free Contextual Biasing ASR with Wildcard CTC-based Keyword Spotting and Inter-layer Biasing
Yu Nakagome
Michael Hentschel
49
0
0
02 Jun 2025
XMAD-Bench: Cross-Domain Multilingual Audio Deepfake Benchmark
Ioan-Paul Ciobanu
Andrei Iulian Hiji
Nicolae-Cătălin Ristea
Paul Irofti
Cristian Rusu
Radu Tudor Ionescu
32
0
0
31 May 2025
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs
Zaid Alyafeai
Maged S. Al-Shaibani
Bernard Ghanem
18
0
0
26 May 2025
LLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic Context
Natsuo Yamashita
Masaaki Yamamoto
Hiroaki Kokubo
Yohei Kawaguchi
31
0
0
23 May 2025
Prosodically Enhanced Foreign Accent Simulation by Discrete Token-based Resynthesis Only with Native Speech Corpora
Kentaro Onda
Keisuke Imoto
Satoru Fukayama
Daisuke Saito
Nobuaki Minematsu
24
0
0
22 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
114
0
0
01 May 2025
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Keisuke Kamahori
Jungo Kasai
Noriyuki Kojima
Baris Kasikci
79
1
0
27 Feb 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
120
0
0
31 Dec 2024
Efficient Adaptation of Multilingual Models for Japanese ASR
Mark Bajo
Haruka Fukukawa
Ryuji Morita
Yuma Ogasawara
81
1
0
14 Dec 2024
Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model
Joonyong Park
Daisuke Saito
Nobuaki Minematsu
114
0
0
04 Dec 2024
Model Attribution and Detection of Synthetic Speech via Vocoder Fingerprints
Matías P. Pizarro
M. Laszkiewicz
Shawkat Hesso
D. Kolossa
Asja Fischer
150
1
0
21 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
59
3
0
14 Nov 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
129
5
0
23 Sep 2024
A quest through interconnected datasets: lessons from highly-cited ICASSP papers
Cynthia C. S. Liem
Doğa Taşcılar
Andrew M. Demetriou
54
0
0
19 Sep 2024
DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
Jiawei Du
I-Ming Lin
I-Hsiang Chiu
Xuanjun Chen
Haibin Wu
Wenze Ren
Yu Tsao
Hung-yi Lee
Jyh-Shing Roger Jang
DiffM
66
2
0
13 Sep 2024
VoiceWukong: Benchmarking Deepfake Voice Detection
Ziwei Yan
Yanjie Zhao
Haoyu Wang
119
1
0
10 Sep 2024
A Preliminary Investigation on Flexible Singing Voice Synthesis Through Decomposed Framework with Inferrable Features
Lester Phillip Violeta
Taketo Akama
65
0
0
12 Jul 2024
Scaling Law in Neural Data: Non-Invasive Speech Decoding with 175 Hours of EEG Data
Motoshige Sato
Kenichi Tomeoka
Ilya Horiguchi
Kai Arulkumaran
Ryota Kanai
Shuntaro Sasai
121
6
0
10 Jul 2024
NAIST Simultaneous Speech Translation System for IWSLT 2024
Yuka Ko
Ryo Fukuda
Yuta Nishikawa
Yasumasa Kano
Tomoya Yanagita
...
Haotian Tan
Makoto Sakai
S. Sakti
Katsuhito Sudoh
Satoshi Nakamura
126
1
0
30 Jun 2024
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions
Yu Nakagome
Michael Hentschel
72
4
0
21 Jun 2024
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization
Wei-Ping Huang
Sung-Feng Huang
Hung-yi Lee
108
0
0
23 Jan 2024
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition
Yukiya Hono
Koh Mitsuda
Tianyu Zhao
Kentaro Mitsui
Toshiaki Wakatsuki
Kei Sawada
AuLLM
84
8
0
06 Dec 2023
JVNV: A Corpus of Japanese Emotional Speech with Verbal Content and Nonverbal Expressions
Detai Xin
Junfeng Jiang
Shinnosuke Takamichi
Yuki Saito
Akiko Aizawa
Hiroshi Saruwatari
56
12
0
09 Oct 2023
A Comparative Study of Voice Conversion Models with Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023
Ryuichi Yamamoto
Reo Yoneyama
Lester Phillip Violeta
Wen-Chin Huang
Tomoki Toda
71
7
0
08 Oct 2023
Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference
Masao Someki
N. Eng
Yosuke Higuchi
Shinji Watanabe
110
0
0
26 Sep 2023
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
Lester Phillip Violeta
Wen-Chin Huang
D. Ma
Ryuichi Yamamoto
Kazuhiro Kobayashi
Tomoki Toda
70
5
0
18 Sep 2023
An Analysis of Personalized Speech Recognition System Development for the Deaf and Hard-of-Hearing
Lester Phillip Violeta
Tomoki Toda
61
2
0
24 Jun 2023
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing
Hye-jin Shim
Jee-weon Jung
Tomi Kinnunen
65
14
0
31 May 2023
Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis
Seong-Hyun Park
Bohyung Kim
Tae-Hyun Oh
77
1
0
26 May 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks
Yifan Peng
Kwangyoun Kim
Felix Wu
Brian Yan
Siddhant Arora
William Chen
Jiyang Tang
Suwon Shon
Prashant Sridhar
Shinji Watanabe
97
18
0
18 May 2023
Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
88
3
0
09 May 2023
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Shogo Seki
50
9
0
24 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
118
7
0
06 Mar 2023
Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study
Massa Baali
Tomoki Hayashi
Hamdy Mubarak
Soumi Maiti
Shinji Watanabe
W. El-Hajj
Ahmed M. Ali
49
11
0
22 Jan 2023
Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining
Karol Nowakowski
M. Ptaszynski
Kyoko Murasaki
Jagna Nieuwazny
49
27
0
18 Jan 2023
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language
Yusuke Yasuda
Tomoki Toda
121
10
0
16 Dec 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
95
8
0
26 Oct 2022
Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion
D. Ma
Lester Phillip Violeta
Kazuhiro Kobayashi
Tomoki Toda
62
8
0
19 Oct 2022
Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis
Yuta Matsunaga
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
119
2
0
14 Oct 2022
SpecRNet: Towards Faster and More Accessible Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
84
16
0
12 Oct 2022
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
97
28
0
19 Jul 2022
CFAD: A Chinese Dataset for Fake Audio Detection
Haoxin Ma
Jiangyan Yi
Chenglong Wang
Xin Yan
J. Tao
Tao Wang
Shiming Wang
Ruibo Fu
95
30
0
12 Jul 2022
Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection
Piotr Kawa
Marcin Plata
P. Syga
AAML
95
23
0
27 Jun 2022
Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding
Wei-Ping Huang
Po-Chun Chen
Sung-Feng Huang
Hung-yi Lee
72
1
0
27 Jun 2022
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Ziyue Jiang
Zhe Su
Zhou Zhao
Qian Yang
Yi Ren
Jinglin Liu
Zhe Ye
73
5
0
05 Jun 2022
Talking Face Generation with Multilingual TTS
Hyoung-Kyu Song
Sanghyun Woo
Junhyeok Lee
S. Yang
Hyunjae Cho
Youseong Lee
Dongho Choi
Kang-Wook Kim
CVBM
80
22
0
13 May 2022
How does a spontaneously speaking conversational agent affect user behavior?
Takahisa Iizuka
H. Mori
17
3
0
02 May 2022
1
2
Next