Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.08612
Cited By
VoxCeleb: a large-scale speaker identification dataset
26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VoxCeleb: a large-scale speaker identification dataset"
50 / 1,098 papers shown
Title
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
29
0
0
14 May 2025
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
29
1
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
45
0
0
12 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
193
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark Gales
46
0
0
26 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
39
0
0
23 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
30
0
0
11 Apr 2025
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Zhihua Xu
Tianshui Chen
Zhijing Yang
Siyuan Peng
Keze Wang
Liang Lin
33
0
0
08 Apr 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali
Xuelong Li
Hongyu Chen
Rita Singh
Bhiksha Raj
VLM
46
0
0
20 Mar 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Yansen Wang
Yonghua Lin
Yong Qin
40
0
0
20 Mar 2025
Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos
Riku Takahashi
Ryugo Morita
Fuma Kimishima
Kosuke Iwama
Jinjia Zhou
VGen
3DH
60
0
0
12 Mar 2025
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
Weiguang Chen
Junjie Zhang
Jielong Yang
Eng Siong Chng
Xionghu Zhong
68
0
0
07 Mar 2025
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Bolin Chen
Hanwei Zhu
Shanzhi Yin
Lingyu Zhu
Jie Chen
Ru-Ling Liao
Shiqi Wang
Yan Ye
62
1
0
24 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Yiming Li
AuLLM
SyDa
VLM
107
0
0
18 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
83
0
0
05 Feb 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
185
12
0
03 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
118
5
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
Erfan Yazdandoost Hamedani
Mahyar Fazlyab
36
0
0
27 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
43
0
0
31 Dec 2024
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
86
0
0
18 Dec 2024
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner
Ruihang Zhang
Mathieu Tuli
David B. Lindell
80
2
0
16 Dec 2024
Virtual Trial Room with Computer Vision and Machine Learning
Tulashi Prasad Joshi
Amrendra Kumar Yadav
Arjun Chhetri
Suraj Agrahari
Umesh Kanta Ghimire
70
0
0
14 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
66
0
0
07 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
82
0
0
02 Dec 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
64
0
0
18 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
56
1
0
17 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
46
1
0
11 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Yiming Li
34
4
0
05 Nov 2024
Multi-modal biometric authentication: Leveraging shared layer architectures for enhanced security
Vatchala S
Yogesh C
Yeshwanth Govindarajan
Krithik Raja M
Vishal Pramav Amirtha Ganesan
Aashish Vinod A
Dharun Ramesh
37
1
0
04 Nov 2024
Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications
Hah Min Lew
Sahng-Min Yoo
Hyunwoo Kang
Gyeong-Moon Park
38
0
0
01 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
28
3
0
31 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLM
ELM
77
23
0
24 Oct 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Wen Huang
Bing Han
Zhengyang Chen
Shuai Wang
Yanmin Qian
VLM
SSL
29
0
0
22 Oct 2024
Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker Verification
Wan Lin
Junhui Chen
Tianhao Wang
Zhenyu Zhou
Lantian Li
D. Wang
29
0
0
21 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Yuqing Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
35
1
0
18 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
37
0
0
17 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
18
1
0
17 Oct 2024
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Hanbo Cheng
Limin Lin
Chenyu Liu
Pengcheng Xia
Pengfei Hu
Jiefeng Ma
Jun Du
Jia Pan
DiffM
VGen
186
0
0
17 Oct 2024
HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
Stanisław Kacprzak
K. Kowalczyk
MDE
29
0
0
16 Oct 2024
Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization
Shanzhi Yin
Bolin Chen
Shiqi Wang
Yan Ye
VGen
DiffM
34
1
0
14 Oct 2024
Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens
Bolin Chen
Shanzhi Yin
Zihan Zhang
Jie Chen
Ru-Ling Liao
Lingyu Zhu
Shiqi Wang
Yan Ye
31
3
0
11 Oct 2024
SAKA: An Intelligent Platform for Semi-automated Knowledge Graph Construction and Application
Hanrong Zhang
Xuben Wang
Jiabao Pan
Hongwei Wang
108
7
0
10 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
37
0
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
29
2
0
08 Oct 2024
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
29
0
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
14
0
01 Oct 2024
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
42
1
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
31
0
0
25 Sep 2024
1
2
3
4
...
20
21
22
Next