Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.05622
Cited By
VoxCeleb2: Deep Speaker Recognition
14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VoxCeleb2: Deep Speaker Recognition"
50 / 730 papers shown
Title
Quantized Approximate Signal Processing (QASP): Towards Homomorphic Encryption for audio
Tu Duyen Nguyen
Adrien Lesage
Clotilde Cantini
Rachid Riad
21
0
0
15 May 2025
Test-Time Augmentation for Pose-invariant Face Recognition
Jaemin Jung
Youngjoon Jang
Joon Son Chung
CVBM
21
0
0
14 May 2025
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
19
0
0
14 May 2025
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
22
1
0
14 May 2025
Inference Attacks for X-Vector Speaker Anonymization
L. A. Bauer
Wenxuan Bao
Malvika Jadhav
Vincent Bindschaedler
20
0
0
13 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
137
0
0
06 May 2025
VAEmo: Efficient Representation Learning for Visual-Audio Emotion with Knowledge Injection
Hao Cheng
Zhiwei Zhao
Yichao He
Zhenzhen Hu
Jia Li
M. Wang
Richang Hong
43
0
0
05 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
39
0
0
01 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark J. F. Gales
46
0
0
26 Apr 2025
Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation
Weipeng Tan
Chuming Lin
Chengming Xu
F. Xu
Xiaobin Hu
Xiaozhong Ji
Junwei Zhu
Chengjie Wang
Yanwei Fu
44
0
0
25 Apr 2025
Quantifying Source Speaker Leakage in One-to-One Voice Conversion
Scott Wellington
Xuechen Liu
Junichi Yamagishi
35
0
0
22 Apr 2025
Speaker Fuzzy Fingerprints: Benchmarking Text-Based Identification in Multiparty Dialogues
Rui Ribeiro
Luísa Coheur
Joao Paulo Carvalho
28
0
0
21 Apr 2025
Pose and Facial Expression Transfer by using StyleGAN
Petr Jahoda
Jan Cech
CVBM
GAN
63
0
0
17 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
R. Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
126
0
0
12 Apr 2025
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Zhihua Xu
Tianshui Chen
Zhijing Yang
Siyuan Peng
Keze Wang
Liang Lin
26
0
0
08 Apr 2025
Meta-Continual Learning of Neural Fields
Seungyoon Woo
Junhyeog Yun
Gunhee Kim
CLL
AI4CE
26
1
0
08 Apr 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin
Jeongsoo Choi
Puyuan Peng
Joon Son Chung
Tae-Hyun Oh
David F. Harwath
VGen
45
1
0
03 Apr 2025
Causal Self-supervised Pretrained Frontend with Predictive Code for Speech Separation
Wupeng Wang
Zexu Pan
X. Li
Shuai Wang
Haizhou Li
AI4TS
34
0
0
03 Apr 2025
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fa-Ting Hong
Zunnan Xu
Zixiang Zhou
Jun Zhou
Xiu Li
Qin Lin
Qinglin Lu
D. Xu
DiffM
VGen
57
2
0
03 Apr 2025
Refined Geometry-guided Head Avatar Reconstruction from Monocular RGB Video
Pilseo Park
Ze Zhang
M. Sarkis
N. Bi
Xiaoming Liu
Yiying Tong
3DH
50
0
0
27 Mar 2025
DisentTalk: Cross-lingual Talking Face Generation via Semantic Disentangled Diffusion Model
Kangwei Liu
Junwu Liu
Yun Cao
Jinlin Guo
Xiaowei Yi
DiffM
41
0
0
24 Mar 2025
HunyuanPortrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu
Zhentao Yu
Zixiang Zhou
Jun Zhou
Xiaoyu Jin
...
Chengfei Cai
Shiyu Tang
Qin Lin
Xiu Li
Qinglin Lu
DiffM
VGen
91
7
0
24 Mar 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali
X. Li
H. Chen
Rita Singh
Bhiksha Raj
VLM
37
0
0
20 Mar 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu
Q. Yang
Yuan-Ming Li
Yi-Xing Peng
Kun-Yu Lin
Xihan Wei
Jian-Fang Hu
Xiaohua Xie
Wei-Shi Zheng
VLM
65
1
0
17 Mar 2025
Multi-modal Time Series Analysis: A Tutorial and Survey
Yushan Jiang
Kanghui Ning
Zijie Pan
Xuyang Shen
Jingchao Ni
Wenchao Yu
Anderson Schneider
Haifeng Chen
Yuriy Nevmyvaka
Dongjin Song
AI4TS
152
0
0
17 Mar 2025
Context-Aware Two-Step Training Scheme for Domain Invariant Speech Separation
Wupeng Wang
Zexu Pan
Jingru Lin
Shuai Wang
Haizhou Li
53
0
0
16 Mar 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens
Jeong Hun Yeo
Hyeongseop Rha
Se Jin Park
Y. Ro
51
0
0
14 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
63
0
0
14 Mar 2025
DiVISe: Direct Visual-Input Speech Synthesis Preserving Speaker Characteristics And Intelligibility
Yifan Liu
Yu Fang
Zhouhan Lin
38
0
0
07 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
W. Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
79
2
0
04 Mar 2025
GHOST 2.0: generative high-fidelity one shot transfer of heads
A. Groshev
Anastasiia Iashchenko
Pavel Paramonov
Denis Dimitrov
Andrey Kuznetsov
65
0
0
25 Feb 2025
Dimitra: Audio-driven Diffusion model for Expressive Talking Head Generation
Baptiste Chopin
Tashvik Dhamija
P. Balaji
Yaohui Wang
A. Dantcheva
DiffM
VGen
46
0
0
24 Feb 2025
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
Yifan Liang
Fangkun Liu
Andong Li
Xiaodong Li
C. Zheng
47
1
0
17 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
Exploring Active Data Selection Strategies for Continuous Training in Deepfake Detection
Yoshihiko Furuhashi
Junichi Yamagishi
Xin Eric Wang
H. Nguyen
Isao Echizen
40
0
0
11 Feb 2025
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models
Jing-Xuan Zhang
Genshun Wan
Jianqing Gao
Zhen-Hua Ling
47
0
0
09 Feb 2025
Gender Bias in Instruction-Guided Speech Synthesis Models
Chun-Yi Kuan
Hung-yi Lee
63
0
0
08 Feb 2025
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models
Christopher Simic
K. Riedhammer
Tobias Bocklet
91
0
0
03 Feb 2025
EDSep: An Effective Diffusion-Based Method for Speech Source Separation
Jinwei Dong
Xinsheng Wang
Qirong Mao
63
0
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
E. Y. Hamedani
Mahyar Fazlyab
36
0
0
27 Jan 2025
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation
Sungnyun Kim
Sungwoo Cho
Sangmin Bae
Kangwook Jang
Se-Young Yun
SSL
68
1
0
23 Jan 2025
Uncovering the Visual Contribution in Audio-Visual Speech Recognition
Zhaofeng Lin
Naomi Harte
78
1
0
20 Jan 2025
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
33
5
0
17 Jan 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo
Chae Won Kim
Hyunjun Kim
Hyeongseop Rha
Seunghee Han
Wen-Huang Cheng
Y. Ro
52
3
0
03 Jan 2025
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
84
0
0
18 Dec 2024
autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks
Simon Rampp
Andreas Triantafyllopoulos
M. Milling
Björn Schuller
85
0
0
16 Dec 2024
Virtual Trial Room with Computer Vision and Machine Learning
Tulashi Prasad Joshi
Amrendra Kumar Yadav
Arjun Chhetri
Suraj Agrahari
Umesh Kanta Ghimire
66
0
0
14 Dec 2024
Learning to Decouple the Lights for 3D Face Texture Modeling
Tianxin Huang
Zhenyu Zhang
Ying Tai
Gim Hee Lee
CVBM
3DH
68
0
0
11 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
69
0
0
02 Dec 2024
Circumventing shortcuts in audio-visual deepfake detection datasets with unsupervised learning
Dragos-Alexandru Boldisor
Stefan Smeu
Dan Oneaţă
Elisabeta Oneata
98
1
0
29 Nov 2024
1
2
3
4
...
13
14
15
Next