This paper presents a human-centered, real-time, user-adaptive speech-to-sign language animation system that integrates Transformer-based motion generation with a transparent, user-editable JSON intermediate layer. The framework overcomes key limitations in prior sign language technologies by enabling direct user inspection and modification of sign segments, thus enhancing naturalness, expressiveness, and user agency. Leveraging a streaming Conformer encoder and autoregressive Transformer-MDN decoder, the system synchronizes spoken input into upper-body and facial motion for 3D avatar rendering. Edits and user ratings feed into a human-in-the-loop optimization loop for continuous improvement. Experiments with 20 deaf signers and 5 interpreters show that the editable interface and participatory feedback significantly improve comprehension, naturalness, usability, and trust, while lowering cognitive load. With sub-20 ms per-frame inference on standard hardware, the system is ready for real-time communication and education. This work illustrates how technical and participatory innovation together enable accessible, explainable, and user-adaptive AI for sign language technology.
View on arXiv@article{li2025_2506.14677, title={ Design an Editable Speech-to-Sign-Language Transformer System: A Human-Centered AI Approach }, author={ Yingchao Li }, journal={arXiv preprint arXiv:2506.14677}, year={ 2025 } }