ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.05725
  4. Cited By
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

10 August 2023
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
Maryam Fazel-Zarani
Tal Remez
Jade Copet
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
ArXivPDFHTML

Papers citing "EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis"

49 / 49 papers shown
Title
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Segmentation-Variant Codebooks for Preservation of Paralinguistic and Prosodic Information
Nicholas Sanders
Yuanchao Li
Korin Richmond
Simon King
12
0
0
21 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Tiantian Feng
Jihwan Lee
Anfeng Xu
Yoonjeong Lee
Thanathai Lertpetchpun
...
Thomas Thebaud
Laureano Moro-Velazquez
D. Byrd
Najim Dehak
Shrikanth Narayanan
12
0
0
20 May 2025
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
Dan Luo
Chengyuan Ma
Weiqin Li
Jun Wang
Wei Chen
Zhiyong Wu
31
0
0
14 Apr 2025
Pitch Contour Exploration Across Audio Domains: A Vision-Based Transfer Learning Approach
Pitch Contour Exploration Across Audio Domains: A Vision-Based Transfer Learning Approach
J. Abeßer
Shri Kiran Srinivasan
Meinard Muller
46
0
0
24 Mar 2025
Scaling Rich Style-Prompted Text-to-Speech Datasets
Anuj Diwan
Zhisheng Zheng
David Harwath
Eunsol Choi
CLIP
VLM
80
1
0
06 Mar 2025
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker
Matthew Le
Ricky T. Q. Chen
Wei-Ning Hsu
Timo Gerkmann
Alexander Richard
Yi-Chiao Wu
63
0
0
03 Mar 2025
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
Xinbing Wang
Mingqi Jiang
Zejun Ma
Ziyu Zhang
Shixuan Liu
...
Zhifei Li
Xie Chen
Lei Xie
Yu Guo
Wei Xue
84
13
0
03 Mar 2025
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander H. Liu
Sang-gil Lee
Chao-Han Huck Yang
Yuan Gong
Yu-Chun Wang
James Glass
Rafael Valle
Bryan Catanzaro
SSL
60
0
0
02 Mar 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
101
0
0
21 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan
Hung-yi Lee
68
0
0
08 Feb 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Xize Cheng
Dongjie Fu
Xiaoda Yang
Minghui Fang
Ruofan Hu
...
Rongjie Huang
Linjun Li
Yu Chen
Tao Jin
Zhou Zhao
53
1
0
03 Jan 2025
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
Tian-Hao Zhang
Jiawei Zhang
Jun Wang
Xinyuan Qian
Xu-cheng Yin
CVBM
54
0
0
02 Jan 2025
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
35
0
0
18 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
30
0
0
31 Oct 2024
APCodec+: A Spectrum-Coding-Based High-Fidelity and
  High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
Hui-Peng Du
Yang Ai
Rui Zheng
Zhen-Hua Ling
40
0
0
30 Oct 2024
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient
  Learner for text-to-speech synthesis
Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis
Théodor Lemerle
Harrison Vanderbyl
Vaibhav Srivastav
Nicolas Obin
Axel Roebel
39
1
0
30 Oct 2024
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Enhancing TTS Stability in Hebrew using Discrete Semantic Units
Ella Zeldes
Or Tal
Yossi Adi
34
0
0
28 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
60
2
0
16 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
196
0
0
14 Oct 2024
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho
Nicholas Lee
Akshat Gupta
Dhruv Agarwal
Ethan Chen
Alan W Black
Gopala K. Anumanchipalli
34
0
0
09 Oct 2024
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Taejun Bak
Youngsik Eom
SeungJae Choi
Young-Sun Joo
43
0
0
04 Oct 2024
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Wonjune Kang
Junteng Jia
Chunyang Wu
Wei Zhou
Egor Lakomkin
...
Leda Sari
Suyoun Kim
Ke Li
Jay Mahadeokar
Ozlem Kalinli
AuLLM
46
3
0
02 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
17
0
01 Oct 2024
SSR: Alignment-Aware Modality Connector for Speech Language Models
SSR: Alignment-Aware Modality Connector for Speech Language Models
Weiting Tan
Hirofumi Inaguma
Ning Dong
Paden Tomasello
Xutai Ma
32
3
0
30 Sep 2024
Improving Spoken Language Modeling with Phoneme Classification: A Simple
  Fine-tuning Approach
Improving Spoken Language Modeling with Phoneme Classification: A Simple Fine-tuning Approach
Maxime Poli
Emmanuel Chemla
Emmanuel Dupoux
42
2
0
16 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
66
5
0
11 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
37
3
0
05 Sep 2024
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion
  Recognition
SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition
Mohamed Osman
Daniel Z. Kaplan
Tamer Nadeem
31
1
0
14 Aug 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
38
4
0
18 Jun 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
60
7
0
14 Jun 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation
Se Jin Park
Chae Won Kim
Hyeongseop Rha
Minsu Kim
Joanna Hong
Jeong Hun Yeo
Yong Man Ro
CVBM
AuLLM
48
8
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
49
16
0
11 Jun 2024
MELD-ST: An Emotion-aware Speech Translation Dataset
MELD-ST: An Emotion-aware Speech Translation Dataset
Sirou Chen
Sakiko Yahata
Shuichiro Shimizu
Zhengdong Yang
Yihang Li
Chenhui Chu
Sadao Kurohashi
24
1
0
21 May 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
56
1
0
16 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
46
3
0
07 Apr 2024
Scaling Properties of Speech Language Models
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
31
9
0
31 Mar 2024
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing
  Using Discrete Speech Unit Challenge
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito
30
0
0
20 Mar 2024
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu
Dongyang Dai
Zhiyong Wu
46
2
0
08 Mar 2024
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow
  Models
Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models
Neta Shaul
Uriel Singer
Ricky T. Q. Chen
Matt Le
Ali K. Thabet
Albert Pumarola
Y. Lipman
DiffM
51
4
0
02 Mar 2024
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like
Naoyuki Kanda
Xiaofei Wang
Sefik Emre Eskimez
Manthan Thakker
Hemin Yang
...
Yufei Xia
Jinzhu Li
Yanqing Liu
Sheng Zhao
Michael Zeng
35
8
0
12 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLM
VLM
56
34
0
08 Feb 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
59
41
0
30 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
60
77
0
25 Dec 2023
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in
  Speech-to-Speech Models
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel
Antony DÁvirro
Adina Williams
Emmanuel Dupoux
32
4
0
21 Dec 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
35
31
0
21 Nov 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
28
20
0
12 Oct 2023
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised
  Learning with Masked Unit Prediction
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi
Hirofumi Inaguma
Xutai Ma
Ilia Kulikov
Anna Y. Sun
48
24
0
04 Oct 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
40
25
0
14 Sep 2023
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
196
342
0
01 Feb 2021
1