ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.16874
  4. Cited By
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
v1v2v3 (latest)

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

27 May 2024
Xingqun Qi
Hengyuan Zhang
Yatian Wang
J. Pan
Chen Liu
Peng Li
Xiaowei Chi
Mengfei Li
Qixun Zhang
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Qi-fei Liu
    DiffMSLR
ArXiv (abs)PDFHTML

Papers citing "CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild"

50 / 54 papers shown
Title
PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation
PhysiInter: Integrating Physical Mapping for High-Fidelity Human Interaction Generation
Wei Yao
Yunlian Sun
Chang Liu
Hongwen Zhang
Jinhui Tang
26
0
0
09 Jun 2025
Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Co3^{3}3Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
131
0
0
03 May 2025
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
VividListener: Expressive and Controllable Listener Dynamics Modeling for Multi-Modal Responsive Interaction
Shiying Li
Xingqun Qi
Bingkun Yang
Chen Weile
Zezhao Tian
Muyi Sun
Qifeng Liu
Man Zhang
Zhenan Sun
122
0
0
30 Apr 2025
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer
Yong Xie
Yunlian Sun
Hongwen Zhang
Yebin Liu
Jinhui Tang
VGen
149
0
0
27 Mar 2025
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic
  Injection with Large-Scale Pre-Training Diffusion Models
SIGGesture: Generalized Co-Speech Gesture Synthesis via Semantic Injection with Large-Scale Pre-Training Diffusion Models
Qingrong Cheng
Xu Li
Xinghui Fu
DiffM
85
2
0
22 May 2024
Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise
  Attention
Era3D: High-Resolution Multiview Diffusion using Efficient Row-wise Attention
Peng Li
Yuan Liu
Xiaoxiao Long
Feihu Zhang
Cheng Lin
...
Wenhan Luo
Ping Tan
Wenping Wang
Qi-fei Liu
Yi-Ting Guo
VGen
150
51
0
19 May 2024
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu
Qiong Cao
Yandong Wen
Huaiguang Jiang
Changxing Ding
SLR
122
17
0
30 Mar 2024
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with
  Audio2Video Diffusion Model under Weak Conditions
EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian
Qi Wang
Bang Zhang
Liefeng Bo
DiffM
127
126
0
27 Feb 2024
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven
  Holistic 3D Expression and Gesture Generation
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen
Yunfei Liu
Jianan Wang
Ailing Zeng
Yu Li
Qifeng Chen
VGen
105
32
0
09 Jan 2024
OMG: Towards Open-vocabulary Motion Generation via Mixture of
  Controllers
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Hanming Liang
Jiacheng Bao
Ruichi Zhang
Sihan Ren
Yuecheng Xu
Sibei Yang
Xin Chen
Jingyi Yu
Lan Xu
102
26
0
14 Dec 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
103
13
0
29 Nov 2023
From Sparse to Soft Mixtures of Experts
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
201
130
0
02 Aug 2023
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Audio-Visual Segmentation by Exploring Cross-Modal Mutual Semantics
Chen Liu
Peike Li
Xingqun Qi
Hu Zhang
Lincheng Li
Dadong Wang
Xin Yu
VOS
91
34
0
31 Jul 2023
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture
  Generation
EmotionGesture: Audio-Driven Diverse Emotional Co-Speech 3D Gesture Generation
Xingqun Qi
Chen Liu
Lincheng Li
Jie Hou
Haoran Xin
Xin Yu
SLR
93
30
0
30 May 2023
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for
  Large Language Models
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models
Sheng Shen
Le Hou
Yan-Quan Zhou
Nan Du
Shayne Longpre
...
Vincent Zhao
Hongkun Yu
Kurt Keutzer
Trevor Darrell
Denny Zhou
ALMMoE
107
60
0
24 May 2023
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
FunASR: A Fundamental End-to-End Speech Recognition Toolkit
Zhifu Gao
Zerui Li
Jiaming Wang
Haoneng Luo
Xian Shi
...
Yabin Li
Lingyun Zuo
Zhihao Du
Zhangyu Xiao
Shiliang Zhang
91
67
0
18 May 2023
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation
  with Diffusion Models
DiffuseStyleGesture: Stylized Audio-Driven Co-Speech Gesture Generation with Diffusion Models
Sicheng Yang
Zhiyong Wu
Minglei Li
Zhensong Zhang
Lei Hao
Weihong Bao
Ming Cheng
Long Xiao
76
71
0
08 May 2023
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
Tenglong Ao
Zeyi Zhang
Libin Liu
DiffMVGen
144
152
0
26 Mar 2023
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
Lingting Zhu
Xian Liu
Xuanyu Liu
Rui Qian
Ziwei Liu
Lequan Yu
85
120
0
16 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLMMoE
77
68
0
13 Mar 2023
Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand
  Disentanglement
Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement
Xingqun Qi
Chen Liu
Muyi Sun
Lincheng Li
Changjie Fan
Xin Yu
SLR
111
15
0
03 Mar 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
151
243
0
01 Mar 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
384
4,198
1
10 Feb 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
178
2,441
0
19 Dec 2022
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis
Rishabh Dabral
Muhammad Hamza Mughal
Vladislav Golyanik
Christian Theobalt
DiffMVGen
111
183
0
08 Dec 2022
Generating Holistic 3D Human Motion from Speech
Generating Holistic 3D Human Motion from Speech
Hongwei Yi
Hualin Liang
Yifei Liu
Qiong Cao
Yandong Wen
Timo Bolkart
Dacheng Tao
Michael J. Black
SLR
105
151
0
08 Dec 2022
Executing your Commands via Motion Diffusion in Latent Space
Executing your Commands via Motion Diffusion in Latent Space
Xin Chen
Biao Jiang
Wen Liu
Zilong Huang
Bin-Bin Fu
Tao Chen
Jingyi Yu
Gang Yu
VGenDiffM
205
366
0
08 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
230
3,780
0
06 Dec 2022
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Trevor Gale
Deepak Narayanan
C. Young
Matei A. Zaharia
MoE
81
109
0
29 Nov 2022
Safe Real-World Autonomous Driving by Learning to Predict and Plan with
  a Mixture of Experts
Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts
S. Pini
C. Perone
Aayush Ahuja
Ana Ferreira
Moritz Niendorf
Sergey Zagoruyko
89
38
0
03 Nov 2022
Human Motion Diffusion Model
Human Motion Diffusion Model
Guy Tevet
Sigal Raab
Brian Gordon
Yonatan Shafir
Daniel Cohen-Or
Amit H. Bermano
DiffMVGen
287
771
0
29 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
Saeed Ghorbani
Ylva Ferstl
Daniel Holden
N. Troje
M. Carbonneau
123
83
0
15 Sep 2022
CLIFF: Carrying Location Information in Full Frames into Human Pose and
  Shape Estimation
CLIFF: Carrying Location Information in Full Frames into Human Pose and Shape Estimation
Zhihao Li
Jianzhuang Liu
Zhensong Zhang
Songcen Xu
Youliang Yan
3DH
143
225
0
01 Aug 2022
PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular
  Images
PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images
Hongwen Zhang
Yating Tian
Yuxiang Zhang
Mengcheng Li
Liang An
Zhenan Sun
Yebin Liu
3DH
126
147
0
13 Jul 2022
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture
  Generation
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
Xian Liu
Qianyi Wu
Hang Zhou
Yinghao Xu
Rui Qian
Xinyi Lin
Xiaowei Zhou
Wayne Wu
Bo Dai
Bolei Zhou
SLR
112
105
0
24 Mar 2022
BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for
  Conversational Gestures Synthesis
BEAT: A Large-Scale Semantic and Emotional Multi-Modal Dataset for Conversational Gestures Synthesis
Haiyang Liu
Zihao Zhu
Naoya Iwamoto
Yichen Peng
Zhengqing Li
You Zhou
E. Bozkurt
Bo Zheng
SLRCVBM
110
144
0
10 Mar 2022
SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
SmoothNet: A Plug-and-Play Network for Refining Human Poses in Videos
Ailing Zeng
Lei Yang
Xu Ju
Jiefeng Li
Jianyi Wang
Qiang Xu
3DH
92
72
0
27 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
617
15,859
0
20 Dec 2021
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with
  Generative Adversarial Affective Expression Learning
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning
Uttaran Bhattacharya
Elizabeth Childs
Nicholas Rewkowski
Tianyi Zhou
SLRGAN
143
83
0
31 Jul 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
1.1K
30,111
0
26 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
359
3,747
0
18 Feb 2021
Learning Speech-driven 3D Conversational Gestures from Video
Learning Speech-driven 3D Conversational Gestures from Video
I. Habibie
Weipeng Xu
Dushyant Mehta
Lingjie Liu
Hans-Peter Seidel
Gerard Pons-Moll
Mohamed A. Elgharib
Christian Theobalt
SLRCVBM3DH
94
111
0
13 Feb 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
160
2,248
0
11 Jan 2021
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLMDiffM
334
7,539
0
06 Oct 2020
Speech Gesture Generation from the Trimodal Context of Text, Audio, and
  Speaker Identity
Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity
Youngwoo Yoon
Bok Cha
Joo-Haeng Lee
Minsu Jang
Jaeyeon Lee
Jaehong Kim
Geehyuk Lee
79
284
0
04 Sep 2020
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker
  Conditional-Mixture Approach
Style Transfer for Co-Speech Gesture Animation: A Multi-Speaker Conditional-Mixture Approach
Chaitanya Ahuja
Dong Won Lee
Y. Nakano
Louis-Philippe Morency
51
106
0
24 Jul 2020
In defence of metric learning for speaker recognition
In defence of metric learning for speaker recognition
Joon Son Chung
Jaesung Huh
Seongkyu Mun
Minjae Lee
Hee-Soo Heo
Soyeon Choe
Chiheon Ham
Sung-Ye Jung
Bong-Jin Lee
Icksang Han
77
438
0
26 Mar 2020
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
Georgios Pavlakos
Vasileios Choutas
N. Ghorbani
Timo Bolkart
Ahmed A. A. Osman
Dimitrios Tzionas
Michael J. Black
3DH
104
1,730
0
11 Apr 2019
3D Hand Shape and Pose from Images in the Wild
3D Hand Shape and Pose from Images in the Wild
A. Boukhayma
Rodrigo de Bem
Philip Torr
3DH
97
356
0
09 Feb 2019
On the Continuity of Rotation Representations in Neural Networks
On the Continuity of Rotation Representations in Neural Networks
Yi Zhou
Connelly Barnes
Jingwan Lu
Jimei Yang
Hao Li
3DH
99
1,298
0
17 Dec 2018
12
Next