ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.11002
  4. Cited By
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation

Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation

15 April 2025
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
ArXiv (abs)PDFHTML

Papers citing "Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation"

16 / 16 papers shown
Title
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
Yan Rong
Jinting Wang
Shan Yang
Guangzhi Lei
Li Liu
DiffMVGen
53
0
0
28 May 2025
Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
Have Multimodal Large Language Models (MLLMs) Really Learned to Tell the Time on Analog Clocks?
Tairan Fu
Miguel González
Javier Conde
Elena Merino-Gómez
Pedro Reviriego
45
0
0
16 May 2025
SPAgent: Adaptive Task Decomposition and Model Selection for General
  Video Generation and Editing
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGenDiffM
147
6
0
28 Nov 2024
SCOREQ: Speech Quality Assessment with Contrastive Regression
SCOREQ: Speech Quality Assessment with Contrastive Regression
Alessandro Ragano
Jan Skoglund
Andrew Hines
96
12
0
09 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
118
85
0
09 Oct 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
88
27
0
05 Sep 2024
Toward accessible comics for blind and low vision readers
Toward accessible comics for blind and low vision readers
Christophe Rigaud
J. Burie
Samuel Petit
69
3
0
11 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLMDiffM
108
34
0
08 Jul 2024
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer
  based on Supervised Semantic Tokens
CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens
Zhihao Du
Qian Chen
Shiliang Zhang
Kai Hu
Heng Lu
...
Siqi Zheng
Yue Gu
Ziyang Ma
Zhifu Gao
Zhijie Yan
DiffM
74
136
0
07 Jul 2024
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng
Yuxin Cui
Haomiao Tang
Zekun Qi
Runpei Dong
Jing Bai
Chunrui Han
Zheng Ge
Xiangyu Zhang
Shu-Tao Xia
EGVM
136
35
0
24 Jun 2024
Personalized Audiobook Recommendations at Spotify Through Graph Neural
  Networks
Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks
Marco De Nadai
Francesco Fabbri
Paul Gigioli
Alice Wang
Ang Li
...
Sandeep Ghael
David Nyhan
Hugues Bouchard
M. Lalmas
Andreas Damianou
54
12
0
08 Mar 2024
Large Language Models Understand and Can be Enhanced by Emotional
  Stimuli
Large Language Models Understand and Can be Enhanced by Emotional Stimuli
Cheng-rong Li
Jindong Wang
Yixuan Zhang
Kaijie Zhu
Wenxin Hou
Jianxun Lian
Fang Luo
Qiang Yang
Xingxu Xie
LRM
130
133
0
14 Jul 2023
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022
Takaaki Saeki
Detai Xin
Wataru Nakata
Tomoki Koriyama
Shinnosuke Takamichi
Hiroshi Saruwatari
104
211
0
05 Apr 2022
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric
  to Evaluate Noise Suppressors
DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
Chandan K. A. Reddy
Vishak Gopal
Ross Cutler
83
218
0
05 Oct 2021
Seen and Unseen emotional style transfer for voice conversion with a new
  emotional speech dataset
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
76
191
0
28 Oct 2020
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
104
954
0
05 Apr 2019
1