ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.15085
5
0

EmojiVoice: Towards long-term controllable expressivity in robot speech

18 June 2025
Paige Tuttösí
Shivam Mehta
Zachary Syvenky
Bermet Burkanova
Gustav Eje Henter
Angelica Lim
ArXiv (abs)PDFHTML
Main:7 Pages
13 Figures
Bibliography:1 Pages
Abstract

Humans vary their expressivity when speaking for extended periods to maintain engagement with their listener. Although social robots tend to be deployed with ``expressive'' joyful voices, they lack this long-term variation found in human speech. Foundation model text-to-speech systems are beginning to mimic the expressivity in human speech, but they are difficult to deploy offline on robots. We present EmojiVoice, a free, customizable text-to-speech (TTS) toolkit that allows social roboticists to build temporally variable, expressive speech on social robots. We introduce emoji-prompting to allow fine-grained control of expressivity on a phase level and use the lightweight Matcha-TTS backbone to generate speech in real-time. We explore three case studies: (1) a scripted conversation with a robot assistant, (2) a storytelling robot, and (3) an autonomous speech-to-speech interactive agent. We found that using varied emoji prompting improved the perception and expressivity of speech over a long period in a storytelling task, but expressive voice was not preferred in the assistant use case.

View on arXiv
@article{tuttösí2025_2506.15085,
  title={ EmojiVoice: Towards long-term controllable expressivity in robot speech },
  author={ Paige Tuttösí and Shivam Mehta and Zachary Syvenky and Bermet Burkanova and Gustav Eje Henter and Angelica Lim },
  journal={arXiv preprint arXiv:2506.15085},
  year={ 2025 }
}
Comments on this paper