PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles

ACM Multimedia (ACM MM), 2025

13 October 2025

ArXiv (abs)PDF HTML Github (4★)

Main:8 Pages

7 Figures

Bibliography:1 Pages

4 Tables

Abstract

PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Dual-Stream Emotion Extractor (DSEE) that captures both time and frequency-domain audio features for fine-grained emotion analysis, and an Emotional Style Modeling Module (ESMM) that models individual expression patterns based on voiceprint characteristics. To address data scarcity, the method leverages a newly constructed 3D-EmoStyle dataset. Evaluations demonstrate that PESTalk outperforms state-of-the-art methods in producing realistic and personalized facial animations.

View on arXiv

Comments on this paper