Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.08612
Cited By
v1
v2 (latest)
VoxCeleb: a large-scale speaker identification dataset
26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VoxCeleb: a large-scale speaker identification dataset"
50 / 1,111 papers shown
Title
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
385
29
0
01 Jul 2025
Controllable and Expressive One-Shot Video Head Swapping
Chaonan Ji
Jinwei Qi
Peng Zhang
Bang Zhang
Liefeng Bo
DiffM
VGen
19
0
0
20 Jun 2025
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
S. Araki
H. Bredin
49
0
0
13 Jun 2025
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAML
SILM
22
0
0
10 Jun 2025
On the influence of language similarity in non-target speaker verification trials
Paul M. Reuter
Michael Jessen
40
0
0
03 Jun 2025
LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention
Aditya Srinivas Menon
Raj Prakash Gohil
Kumud Tripathi
Pankaj Wasnik
31
0
0
02 Jun 2025
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
Neta Glazer
David Chernin
Idan Achituve
Sharon Gannot
Ethan Fetaya
45
0
0
29 May 2025
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
Lingfeng Yao
Chenpei Huang
Shengyao Wang
Junpei Xue
Hanqing Guo
Jiang Liu
Xun Chen
Miao Pan
24
0
0
28 May 2025
VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin
Zhiqi Ai
Meixuan Bao
Zhiyong Chen
Zhi Yang
Xinnuo Li
Shugong Xu
22
0
0
27 May 2025
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
Pooneh Mousavi
Shubham Gupta
Cem Subakan
Mirco Ravanelli
51
0
0
24 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
167
0
0
23 May 2025
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
Abdul Hannan
Muhammad Arslan Manzoor
Shah Nawaz
Muhammad Irzam Liaqat
Markus Schedl
Mubashir Noman
CVBM
80
0
0
22 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Tiantian Feng
Jihwan Lee
Anfeng Xu
Yoonjeong Lee
Thanathai Lertpetchpun
...
Thomas Thebaud
Laureano Moro-Velazquez
D. Byrd
Najim Dehak
Shrikanth Narayanan
93
1
0
20 May 2025
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
Theo Lepage
Reda Dehak
76
1
0
20 May 2025
SounDiT: Geo-Contextual Soundscape-to-Landscape Generation
Junbo Wang
Haofeng Tan
Bowen Liao
Albert Jiang
Teng Fei
Qixing Huang
Zhengzhong Tu
Shan Ye
Yuhao Kang
118
0
0
19 May 2025
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
54
1
0
14 May 2025
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
55
0
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
87
0
0
12 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
443
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark Gales
110
0
0
26 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
115
0
0
23 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
107
1
0
11 Apr 2025
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Zhihua Xu
Tianshui Chen
Zhijing Yang
Siyuan Peng
Keze Wang
Liang Lin
89
1
0
08 Apr 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Yansen Wang
Yonghua Lin
Yong Qin
68
0
0
20 Mar 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali
Xuelong Li
Hong Chen
Rita Singh
Bhiksha Raj
VLM
132
0
0
20 Mar 2025
Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos
Riku Takahashi
Ryugo Morita
Fuma Kimishima
Kosuke Iwama
Jinjia Zhou
VGen
3DH
84
2
0
12 Mar 2025
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
Weiguang Chen
Junjie Zhang
Jielong Yang
Eng Siong Chng
Xionghu Zhong
136
0
0
07 Mar 2025
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Bolin Chen
Hanwei Zhu
Shanzhi Yin
Lingyu Zhu
Jie Chen
Ru-Ling Liao
Shiqi Wang
Yan Ye
96
2
0
24 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLM
SyDa
VLM
170
1
0
18 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
149
0
0
05 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
170
16
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
Erfan Yazdandoost Hamedani
Mahyar Fazlyab
91
3
0
27 Jan 2025
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
68
1
0
31 Dec 2024
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
63
0
0
31 Dec 2024
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
168
1
0
18 Dec 2024
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner
Ruihang Zhang
Mathieu Tuli
David B. Lindell
126
3
0
16 Dec 2024
Virtual Trial Room with Computer Vision and Machine Learning
Tulashi Prasad Joshi
Amrendra Kumar Yadav
Arjun Chhetri
Suraj Agrahari
Umesh Kanta Ghimire
90
0
0
14 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
111
1
0
07 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
147
0
0
02 Dec 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
125
0
0
18 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
126
1
0
17 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen-An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
99
5
0
11 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haoyang Li
78
4
0
05 Nov 2024
Multi-modal biometric authentication: Leveraging shared layer architectures for enhanced security
Vatchala S
Yogesh C
Yeshwanth Govindarajan
Krithik Raja M
Vishal Pramav Amirtha Ganesan
Aashish Vinod A
Dharun Ramesh
65
1
0
04 Nov 2024
Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications
Hah Min Lew
Sahng-Min Yoo
Hyunwoo Kang
Gyeong-Moon Park
52
0
0
01 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
68
3
0
31 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLM
ELM
131
46
0
24 Oct 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Wen Huang
Bing Han
Zhengyang Chen
Shuai Wang
Yanmin Qian
VLM
SSL
52
0
0
22 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Yue Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
79
1
0
18 Oct 2024
1
2
3
4
...
21
22
23
Next