ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.06720
  4. Cited By
Audio2Gestures: Generating Diverse Gestures from Speech Audio with
  Conditional Variational Autoencoders

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

15 August 2021
Jing Li
Di Kang
Wenjie Pei
Xuefei Zhe
Ying Zhang
Zhenyu He
Linchao Bao
    SLR
ArXivPDFHTML

Papers citing "Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders"

50 / 71 papers shown
Title
M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis
M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis
Zhizhuo Yin
Yuk Hang Tsui
Pan Hui
SLR
VGen
21
0
0
13 May 2025
Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Co3^{3}3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
66
0
0
03 May 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
30
0
0
12 Apr 2025
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
Yong Xie
Yunlian Sun
Hongwen Zhang
Y. Liu
Jinhui Tang
VGen
98
0
0
27 Mar 2025
Cosh-DiT: Co-Speech Gesture Video Synthesis via Hybrid Audio-Visual Diffusion Transformers
Yasheng Sun
Zhiliang Xu
Hang Zhou
Jiazhi Guan
Quanwei Yang
...
Yingying Li
Haocheng Feng
Jiadong Wang
Ziwei Liu
Koike Hideki
VGen
61
0
0
13 Mar 2025
ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
ExGes: Expressive Human Motion Retrieval and Modulation for Audio-Driven Gesture Synthesis
Xukun Zhou
Fengxin Li
Ming Chen
Yan Zhou
Pengfei Wan
Di Zhang
Yeying Jin
Zhaoxin Fan
Hongyan Liu
Jun He
DiffM
VGen
51
0
0
09 Mar 2025
Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent
  Diffusion Transformer
Two-in-One: Unified Multi-Person Interactive Motion Generation by Latent Diffusion Transformer
Yangqiu Song
Xihua Wang
Ruihua Song
Wenbing Huang
DiffM
VGen
80
1
0
21 Dec 2024
Joint Co-Speech Gesture and Expressive Talking Face Generation using
  Diffusion with Adapters
Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters
S. Hogue
Chenxu Zhang
Yapeng Tian
Xiaohu Guo
DiffM
76
0
0
18 Dec 2024
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D
  Human Motion
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen
Juze Zhang
S. K. Lakshmikanth
Yusu Fang
Ruizhi Shao
Gordon Wetzstein
L. Fei-Fei
Ehsan Adeli
VGen
82
3
0
13 Dec 2024
Acoustic-based 3D Human Pose Estimation Robust to Human Position
Acoustic-based 3D Human Pose Estimation Robust to Human Position
Yusuke Oumi
Yuto Shibata
Go Irie
Akisato Kimura
Yoshimitsu Aoki
Mariko Isogawa
33
1
0
08 Nov 2024
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided
  Mixture-of-Experts
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts
Xiang Deng
Youxin Pang
Xiaochen Zhao
Chao Xu
Lizhen Wang
Hongjiang Xiao
Shi Yan
Hongwen Zhang
Yebin Liu
DiffM
VGen
40
1
0
31 Oct 2024
Towards a GENEA Leaderboard -- an Extended, Living Benchmark for
  Evaluating and Advancing Conversational Motion Synthesis
Towards a GENEA Leaderboard -- an Extended, Living Benchmark for Evaluating and Advancing Conversational Motion Synthesis
Rajmund Nagy
Hendric Voss
Youngwoo Yoon
Taras Kucherenko
Teodor Nikolov
Thanh Hoang-Minh
R. Mcdonnell
Stefan Kopp
Michael Neff
G. Henter
34
1
0
08 Oct 2024
LLM Gesticulator: Leveraging Large Language Models for Scalable and
  Controllable Co-Speech Gesture Synthesis
LLM Gesticulator: Leveraging Large Language Models for Scalable and Controllable Co-Speech Gesture Synthesis
Haozhou Pang
Tianwei Ding
Lanshan He
Ming Tao
Lu Zhang
Qi Gan
26
1
0
06 Oct 2024
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion
  Generation
Enabling Synergistic Full-Body Control in Prompt-Based Co-Speech Motion Generation
Bohong Chen
Yumeng Li
Yao-Xiang Ding
Tianjia Shao
Kun Zhou
37
7
0
01 Oct 2024
FastTalker: Jointly Generating Speech and Conversational Gestures from
  Text
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Zixin Guo
Jian Zhang
34
1
0
24 Sep 2024
2D or not 2D: How Does the Dimensionality of Gesture Representation
  Affect 3D Co-Speech Gesture Generation?
2D or not 2D: How Does the Dimensionality of Gesture Representation Affect 3D Co-Speech Gesture Generation?
Teo Guichoux
Laure Soulier
Nicolas Obin
Catherine Pelachaud
SLR
32
0
0
16 Sep 2024
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE
ProbTalk3D: Non-Deterministic Emotion Controllable Speech-Driven 3D Facial Animation Synthesis Using VQ-VAE
Sichun Wu
Kazi Injamamul Haque
Zerrin Yumak
VGen
30
2
0
12 Sep 2024
DiffTED: One-shot Audio-driven TED Talk Video Generation with
  Diffusion-based Co-speech Gestures
DiffTED: One-shot Audio-driven TED Talk Video Generation with Diffusion-based Co-speech Gestures
S. Hogue
Chenxu Zhang
Hamza Daruger
Yapeng Tian
Xiaohu Guo
VGen
43
10
0
11 Sep 2024
Combo: Co-speech holistic 3D human motion generation and efficient
  customizable adaptation in harmony
Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony
Chao Xu
Mingze Sun
Zhi-Qi Cheng
Fei Wang
Yang Liu
Baigui Sun
Ruqi Huang
Alexander G. Hauptmann
VGen
45
2
0
18 Aug 2024
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim
Jungbin Cho
Joonho Park
Soonmin Hwang
Da Eun Kim
Geon Kim
Youngjae Yu
57
1
0
12 Aug 2024
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture
  Generation
MDT-A2G: Exploring Masked Diffusion Transformers for Co-Speech Gesture Generation
Xiaofeng Mao
Zhengkai Jiang
Qilin Wang
Chencan Fu
Jiangning Zhang
Jiafu Wu
Yabiao Wang
Chengjie Wang
Wei Li
Mingmin Chi
80
4
0
06 Aug 2024
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and
  Disentangled Multi-Modality Fusion
MambaGesture: Enhancing Co-Speech Gesture Generation with Mamba and Disentangled Multi-Modality Fusion
Chencan Fu
Yabiao Wang
Jiangning Zhang
Zhengkai Jiang
Xiaofeng Mao
Jiafu Wu
Weijian Cao
Chengjie Wang
Yanhao Ge
Yong Liu
Mamba
43
2
0
29 Jul 2024
Robust Facial Reactions Generation: An Emotion-Aware Framework with
  Modality Compensation
Robust Facial Reactions Generation: An Emotion-Aware Framework with Modality Compensation
Guanyu Hu
Jie Wei
Siyang Song
Dimitrios Kollias
Xinyu Yang
Zhonglin Sun
Odysseus Kaloidas
40
0
0
22 Jul 2024
Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective
  Face and Body Expressions from Affordable Inputs
Speech2UnifiedExpressions: Synchronous Synthesis of Co-Speech Affective Face and Body Expressions from Affordable Inputs
Uttaran Bhattacharya
Aniket Bera
Dinesh Manocha
CVBM
46
2
0
26 Jun 2024
Investigating the impact of 2D gesture representation on co-speech
  gesture generation
Investigating the impact of 2D gesture representation on co-speech gesture generation
Teo Guichoux
Laure Soulier
Nicolas Obin
Catherine Pelachaud
SLR
19
0
0
21 Jun 2024
Diffusion Gaussian Mixture Audio Denoise
Diffusion Gaussian Mixture Audio Denoise
Pu Wang
Junhui Li
Jialu Li
Liangdong Guo
Youshan Zhang
DiffM
34
0
0
13 Jun 2024
Programmable Motion Generation for Open-Set Motion Control Tasks
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu
Xiaohang Zhan
Shaoli Huang
Tai-Jiang Mu
Ying Shan
49
5
0
29 May 2024
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion
  Inversion
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion
Zeyu Zhao
Nan Gao
Zhi Zeng
Guixuan Zhang
Jie Liu
Shuwu Zhang
DiffM
44
0
0
03 Apr 2024
Large Motion Model for Unified Multi-Modal Motion Generation
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang
Daisheng Jin
Chenyang Gu
Fangzhou Hong
Zhongang Cai
...
Chongzhi Zhang
Xinying Guo
Lei Yang
Ying He
Ziwei Liu
VGen
53
25
0
01 Apr 2024
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu
Qiong Cao
Yandong Wen
Huaiguang Jiang
Changxing Ding
SLR
63
13
0
30 Mar 2024
Move as You Say, Interact as You Can: Language-guided Human Motion
  Generation with Scene Affordance
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang
Yixin Chen
Baoxiong Jia
Puhao Li
Jinlu Zhang
Jingze Zhang
Tengyu Liu
Yixin Zhu
Wei Liang
Siyuan Huang
VGen
DiffM
49
36
0
26 Mar 2024
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture
  Synthesis
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal
Rishabh Dabral
I. Habibie
Lucia Donatelli
Marc Habermann
Christian Theobalt
SLR
38
15
0
26 Mar 2024
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
MambaTalk: Efficient Holistic Gesture Synthesis with Selective State Space Models
Zunnan Xu
Yukang Lin
Haonan Han
Sicheng Yang
Ronghui Li
Yachao Zhang
Xiu Li
Mamba
46
25
0
14 Mar 2024
Maia: A Real-time Non-Verbal Chat for Human-AI Interaction
Maia: A Real-time Non-Verbal Chat for Human-AI Interaction
Dragos Costea
Alina Marcu
Cristina Lazar
Marius Leordeanu
27
0
0
09 Feb 2024
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via
  Expressive Masked Audio Gesture Modeling
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu
Zihao Zhu
Giorgio Becherini
Yichen Peng
Mingyang Su
You Zhou
Xuefei Zhe
Naoya Iwamoto
Bo Zheng
Michael J. Black
SLR
37
29
0
31 Dec 2023
Realistic Human Motion Generation with Cross-Diffusion Models
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren
Shaoli Huang
Xiu Li
VGen
27
4
0
18 Dec 2023
Emotional Speech-driven 3D Body Animation via Disentangled Latent
  Diffusion
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre
Radek Danvevcek
Nikos Athanasiou
Giorgio Becherini
Christopher Peters
Michael J. Black
Timo Bolkart
DiffM
36
16
0
07 Dec 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
34
11
0
29 Nov 2023
SpeechAct: Towards Generating Whole-body Motion from Speech
Jinsong Zhang
Minjie Zhu
Yuxiang Zhang
Yebin Liu
Kun Li
28
0
0
29 Nov 2023
Controllable Group Choreography using Contrastive Diffusion
Controllable Group Choreography using Contrastive Diffusion
Nhat Le
Tuong Khanh Long Do
Khoa Do
Hien Nguyen
Erman Tjiputra
Quang-Dieu Tran
Anh Nguyen
45
10
0
29 Oct 2023
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture
  Generation
ACT2G: Attention-based Contrastive Learning for Text-to-Gesture Generation
Hitoshi Teshima
Naoki Wake
Diego Thomas
Yuta Nakashima
Hiroshi Kawasaki
Katsushi Ikeuchi
29
0
0
28 Sep 2023
Towards the generation of synchronized and believable non-verbal facial
  behaviors of a talking virtual agent
Towards the generation of synchronized and believable non-verbal facial behaviors of a talking virtual agent
Alice Delbosc
M. Ochs
Nicolas Sabouret
Brian Ravenet
Stéphane Ayache
40
7
0
15 Sep 2023
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Sicheng Yang
Zehao Wang
Zhiyong Wu
Minglei Li
Zhensong Zhang
...
Lei Hao
Songcen Xu
Xiaofei Wu
Changpeng Yang
Zonghong Dai
DiffM
47
14
0
13 Sep 2023
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio
  Representation
Diffusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
Anna Deichler
Shivam Mehta
Simon Alexanderson
Jonas Beskow
DiffM
17
23
0
11 Sep 2023
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
BodyFormer: Semantics-guided 3D Body Gesture Synthesis with Transformer
Kunkun Pang
Dafei Qin
Yingruo Fan
Julian Habekost
Takaaki Shiratori
Junichi Yamagishi
Taku Komura
SLR
ViT
21
19
0
07 Sep 2023
C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion
  Model
C2G2: Controllable Co-speech Gesture Generation with Latent Diffusion Model
Longbin Ji
Pengfei Wei
Yi Ren
Jinglin Liu
Chen Zhang
Xiang Yin
DiffM
34
3
0
29 Aug 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
Sicheng Yang
Haiwei Xue
Zhensong Zhang
Minglei Li
Zhiyong Wu
Xiaofei Wu
Songcen Xu
Zonghong Dai
DiffM
37
15
0
26 Aug 2023
A Survey on Deep Multi-modal Learning for Body Language Recognition and
  Generation
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Li Liu
Lufei Gao
Wen-Ling Lei
Fengji Ma
Xiaotian Lin
Jin-Tao Wang
CVBM
27
5
0
17 Aug 2023
Audio is all in one: speech-driven gesture synthetics using WavLM pre-trained model
Fan Zhang
Naye Ji
Fuxing Gao
Siyuan Zhao
Zhaohan Wang
Shunman Li
29
0
0
11 Aug 2023
Human Motion Generation: A Survey
Human Motion Generation: A Survey
Wentao Zhu
Xiaoxuan Ma
Dongwoo Ro
Hai Ci
Jinlu Zhang
Jiaxin Shi
Feng Gao
Qi Tian
Yizhou Wang
VGen
47
53
0
20 Jul 2023
12
Next