ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset
v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown
Title
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffMVGen
385
29
0
01 Jul 2025
Controllable and Expressive One-Shot Video Head Swapping
Controllable and Expressive One-Shot Video Head Swapping
Chaonan Ji
Jinwei Qi
Peng Zhang
Bang Zhang
Liefeng Bo
DiffMVGen
19
0
0
20 Jun 2025
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Dissecting the Segmentation Model of End-to-End Diarization with Vector Clustering
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
S. Araki
H. Bredin
49
0
0
13 Jun 2025
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao
Fen Xiao
Xiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAMLSILM
22
0
0
10 Jun 2025
On the influence of language similarity in non-target speaker verification trials
On the influence of language similarity in non-target speaker verification trials
Paul M. Reuter
Michael Jessen
40
0
0
03 Jun 2025
LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention
LASPA: Language Agnostic Speaker Disentanglement with Prefix-Tuned Cross-Attention
Aditya Srinivas Menon
Raj Prakash Gohil
Kumud Tripathi
Pankaj Wasnik
31
0
0
02 Jun 2025
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes
Neta Glazer
David Chernin
Idan Achituve
Sharon Gannot
Ethan Fetaya
45
0
0
29 May 2025
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
SpeechVerifier: Robust Acoustic Fingerprint against Tampering Attacks via Watermarking
Lingfeng Yao
Chenpei Huang
Shengyao Wang
Junpei Xue
Hanqing Guo
Jiang Liu
Xun Chen
Miao Pan
24
0
0
28 May 2025
VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin
VoxAging: Continuously Tracking Speaker Aging with a Large-Scale Longitudinal Dataset in English and Mandarin
Zhiqi Ai
Meixuan Bao
Zhiyong Chen
Zhi Yang
Xinnuo Li
Shugong Xu
22
0
0
27 May 2025
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
LiSTEN: Learning Soft Token Embeddings for Neural Audio LLMs
Pooneh Mousavi
Shubham Gupta
Cem Subakan
Mirco Ravanelli
51
0
0
24 May 2025
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
TEDI: Trustworthy and Ethical Dataset Indicators to Analyze and Compare Dataset Documentation
Wiebke Hutiri
Mircea Cimpoi
M. Scheuerman
Victoria Matthews
Alice Xiang
167
0
0
23 May 2025
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
PAEFF: Precise Alignment and Enhanced Gated Feature Fusion for Face-Voice Association
Abdul Hannan
Muhammad Arslan Manzoor
Shah Nawaz
Muhammad Irzam Liaqat
Markus Schedl
Mubashir Noman
CVBM
80
0
0
22 May 2025
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits
Tiantian Feng
Jihwan Lee
Anfeng Xu
Yoonjeong Lee
Thanathai Lertpetchpun
...
Thomas Thebaud
Laureano Moro-Velazquez
D. Byrd
Najim Dehak
Shrikanth Narayanan
93
1
0
20 May 2025
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
SSPS: Self-Supervised Positive Sampling for Robust Self-Supervised Speaker Verification
Theo Lepage
Reda Dehak
76
1
0
20 May 2025
SounDiT: Geo-Contextual Soundscape-to-Landscape Generation
SounDiT: Geo-Contextual Soundscape-to-Landscape Generation
Junbo Wang
Haofeng Tan
Bowen Liao
Albert Jiang
Teng Fei
Qixing Huang
Zhengzhong Tu
Shan Ye
Yuhao Kang
118
0
0
19 May 2025
Introducing voice timbre attribute detection
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
54
1
0
14 May 2025
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong Aik Lee
Zhen-Hua Ling
55
0
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
87
0
0
12 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
443
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
112
0
0
01 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark Gales
110
0
0
26 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
115
0
0
23 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
107
1
0
11 Apr 2025
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Zhihua Xu
Tianshui Chen
Zhijing Yang
Siyuan Peng
Keze Wang
Liang Lin
89
1
0
08 Apr 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Yansen Wang
Yonghua Lin
Yong Qin
68
0
0
20 Mar 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali
Xuelong Li
Hong Chen
Rita Singh
Bhiksha Raj
VLM
132
0
0
20 Mar 2025
Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos
Riku Takahashi
Ryugo Morita
Fuma Kimishima
Kosuke Iwama
Jinjia Zhou
VGen3DH
84
2
0
12 Mar 2025
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
Weiguang Chen
Junjie Zhang
Jielong Yang
Eng Siong Chng
Xionghu Zhong
136
0
0
07 Mar 2025
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Bolin Chen
Hanwei Zhu
Shanzhi Yin
Lingyu Zhu
Jie Chen
Ru-Ling Liao
Shiqi Wang
Yan Ye
96
2
0
24 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLMSyDaVLM
170
1
0
18 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
149
0
0
05 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLMSyDa
170
16
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
Erfan Yazdandoost Hamedani
Mahyar Fazlyab
91
3
0
27 Jan 2025
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
68
1
0
31 Dec 2024
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
63
0
0
31 Dec 2024
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
168
1
0
18 Dec 2024
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
  Diffusion Models
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner
Ruihang Zhang
Mathieu Tuli
David B. Lindell
126
3
0
16 Dec 2024
Virtual Trial Room with Computer Vision and Machine Learning
Virtual Trial Room with Computer Vision and Machine Learning
Tulashi Prasad Joshi
Amrendra Kumar Yadav
Arjun Chhetri
Suraj Agrahari
Umesh Kanta Ghimire
90
0
0
14 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
111
1
0
07 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker
  Verification
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
147
0
0
02 Dec 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
125
0
0
18 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
126
1
0
17 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen-An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
99
5
0
11 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Haoyang Li
78
4
0
05 Nov 2024
Multi-modal biometric authentication: Leveraging shared layer
  architectures for enhanced security
Multi-modal biometric authentication: Leveraging shared layer architectures for enhanced security
Vatchala S
Yogesh C
Yeshwanth Govindarajan
Krithik Raja M
Vishal Pramav Amirtha Ganesan
Aashish Vinod A
Dharun Ramesh
65
1
0
04 Nov 2024
Towards High-fidelity Head Blending with Chroma Keying for Industrial
  Applications
Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications
Hah Min Lew
Sahng-Min Yoo
Hyunwoo Kang
Gyeong-Moon Park
52
0
0
01 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
68
3
0
31 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLMELM
131
46
0
24 Oct 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain
  Adaptation in Speaker Verification
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Wen Huang
Bing Han
Zhengyang Chen
Shuai Wang
Yanmin Qian
VLMSSL
52
0
0
22 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical
  and Landmark Loss Optimization
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Yue Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
79
1
0
18 Oct 2024
1234...212223
Next