ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,098 papers shown
Title
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan
Zhengyan Sheng
Jinghao He
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
29
0
0
14 May 2025
Introducing voice timbre attribute detection
Introducing voice timbre attribute detection
Jinghao He
Zhengyan Sheng
Liping Chen
Kong AiK Lee
Zhen-Hua Ling
29
1
0
14 May 2025
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
Dianwen Ng
Kun Zhou
Yi-Wen Chao
Zhiwei Xiong
B. Ma
Eng Siong Chng
45
0
0
12 May 2025
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification
Ya Li
Bin Zhou
Bo Hu
193
0
0
06 May 2025
Voice Cloning: Comprehensive Survey
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark Gales
46
0
0
26 Apr 2025
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
SoCov: Semi-Orthogonal Parametric Pooling of Covariance Matrix for Speaker Recognition
Rongjin Li
Weibin Zhang
Dongpeng Chen
Jintao Kang
Xiaofen Xing
39
0
0
23 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
30
0
0
11 Apr 2025
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Exploiting Temporal Audio-Visual Correlation Embedding for Audio-Driven One-Shot Talking Head Animation
Zhihua Xu
Tianshui Chen
Zhijing Yang
Siyuan Peng
Keze Wang
Liang Lin
33
0
0
08 Apr 2025
CAARMA: Class Augmentation with Adversarial Mixup Regularization
CAARMA: Class Augmentation with Adversarial Mixup Regularization
Massa Baali
Xuelong Li
Hongyu Chen
Rita Singh
Bhiksha Raj
VLM
46
0
0
20 Mar 2025
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
Yang Chen
Hui Wang
Shiyao Wang
Jianfei Chen
Jiabei He
Jiaming Zhou
Xi Yang
Yansen Wang
Yonghua Lin
Yong Qin
40
0
0
20 Mar 2025
Bidirectional Learned Facial Animation Codec for Low Bitrate Talking Head Videos
Riku Takahashi
Ryugo Morita
Fuma Kimishima
Kosuke Iwama
Jinjia Zhou
VGen
3DH
60
0
0
12 Mar 2025
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
Weiguang Chen
Junjie Zhang
Jielong Yang
Eng Siong Chng
Xionghu Zhong
68
0
0
07 Mar 2025
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Pleno-Generation: A Scalable Generative Face Video Compression Framework with Bandwidth Intelligence
Bolin Chen
Hanwei Zhu
Shanzhi Yin
Lingyu Zhu
Jie Chen
Ru-Ling Liao
Shiqi Wang
Yan Ye
62
1
0
24 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMs
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Yiming Li
AuLLM
SyDa
VLM
107
0
0
18 Feb 2025
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
Jakob Poncelet
Hugo Van hamme
83
0
0
05 Feb 2025
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models
Gaojie Lin
Jianwen Jiang
Jiaqi Yang
Zerong Zheng
Chao Liang
DiffM
VGen
185
12
0
03 Feb 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
118
5
0
28 Jan 2025
Safe Gradient Flow for Bilevel Optimization
Safe Gradient Flow for Bilevel Optimization
Sina Sharifi
Nazanin Abolfazli
Erfan Yazdandoost Hamedani
Mahyar Fazlyab
36
0
0
27 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
43
0
0
31 Dec 2024
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
GLCF: A Global-Local Multimodal Coherence Analysis Framework for Talking Face Generation Detection
Xiaocan Chen
Qilin Yin
Jiarui Liu
Wei Lu
Xiangyang Luo
Jiantao Zhou
CVBM
86
0
0
18 Dec 2024
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View
  Diffusion Models
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner
Ruihang Zhang
Mathieu Tuli
David B. Lindell
80
2
0
16 Dec 2024
Virtual Trial Room with Computer Vision and Machine Learning
Virtual Trial Room with Computer Vision and Machine Learning
Tulashi Prasad Joshi
Amrendra Kumar Yadav
Arjun Chhetri
Suraj Agrahari
Umesh Kanta Ghimire
70
0
0
14 Dec 2024
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
66
0
0
07 Dec 2024
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker
  Verification
Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification
Bei Liu
Yanmin Qian
82
0
0
02 Dec 2024
An Investigation of Reprogramming for Cross-Language Adaptation in Speaker Verification Systems
Jingyu Li
Aemon Yat Fei Chiu
Tan Lee
64
0
0
18 Nov 2024
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation
Haiyang Yu
Tian Xie
Jiaping Gui
Pengyang Wang
P. Yi
Yue Wu
56
1
0
17 Nov 2024
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Building a Taiwanese Mandarin Spoken Language Model: A First Attempt
Chih-Kai Yang
Yu-Kuan Fu
Chen An Li
Yi-Cheng Lin
Yu-Xiang Lin
...
Ulin Sanga
Xuanjun Chen
Po-Chun Hsu
Shu-Wen Yang
Hung-yi Lee
AuLLM
46
1
0
11 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Yiming Li
34
4
0
05 Nov 2024
Multi-modal biometric authentication: Leveraging shared layer
  architectures for enhanced security
Multi-modal biometric authentication: Leveraging shared layer architectures for enhanced security
Vatchala S
Yogesh C
Yeshwanth Govindarajan
Krithik Raja M
Vishal Pramav Amirtha Ganesan
Aashish Vinod A
Dharun Ramesh
37
1
0
04 Nov 2024
Towards High-fidelity Head Blending with Chroma Keying for Industrial
  Applications
Towards High-fidelity Head Blending with Chroma Keying for Industrial Applications
Hah Min Lew
Sahng-Min Yoo
Hyunwoo Kang
Gyeong-Moon Park
38
0
0
01 Nov 2024
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang
Wei Hao
Aroon Sankoh
William Lin
Emanuel Mendiola-Ortiz
Junfeng Yang
Chengzhi Mao
AAML
28
3
0
31 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLM
ELM
77
23
0
24 Oct 2024
Prototype and Instance Contrastive Learning for Unsupervised Domain
  Adaptation in Speaker Verification
Prototype and Instance Contrastive Learning for Unsupervised Domain Adaptation in Speaker Verification
Wen Huang
Bing Han
Zhengyang Chen
Shuai Wang
Yanmin Qian
VLM
SSL
29
0
0
22 Oct 2024
Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker
  Verification
Neural Scoring, Not Embedding: A Novel Framework for Robust Speaker Verification
Wan Lin
Junhui Chen
Tianhao Wang
Zhenyu Zhou
Lantian Li
D. Wang
29
0
0
21 Oct 2024
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical
  and Landmark Loss Optimization
Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization
Bin Lin
Yanzhen Yu
Jianhao Ye
Ruitao Lv
Yuqing Yang
Ruoye Xie
Pan Yu
Hongbin Zhou
VGen
35
1
0
18 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
37
0
0
17 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in
  Multispeaker Text-to-Speech
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
18
1
0
17 Oct 2024
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Hanbo Cheng
Limin Lin
Chenyu Liu
Pengcheng Xia
Pengfei Hu
Jiefeng Ma
Jun Du
Jia Pan
DiffM
VGen
186
0
0
17 Oct 2024
HeightCeleb - an enrichment of VoxCeleb dataset with speaker height
  information
HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
Stanisław Kacprzak
K. Kowalczyk
MDE
29
0
0
16 Oct 2024
Generative Human Video Compression with Multi-granularity Temporal
  Trajectory Factorization
Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization
Shanzhi Yin
Bolin Chen
Shiqi Wang
Yan Ye
VGen
DiffM
34
1
0
14 Oct 2024
Beyond GFVC: A Progressive Face Video Compression Framework with
  Adaptive Visual Tokens
Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens
Bolin Chen
Shanzhi Yin
Zihan Zhang
Jie Chen
Ru-Ling Liao
Lingyu Zhu
Shiqi Wang
Yan Ye
31
3
0
11 Oct 2024
SAKA: An Intelligent Platform for Semi-automated Knowledge Graph
  Construction and Application
SAKA: An Intelligent Platform for Semi-automated Knowledge Graph Construction and Application
Hanrong Zhang
Xuben Wang
Jiabao Pan
Hongwei Wang
108
7
0
10 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Mamba-based Segmentation Model for Speaker Diarization
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
37
0
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
29
2
0
08 Oct 2024
Improving Speaker Representations Using Contrastive Losses on
  Multi-scale Features
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
29
0
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
61
14
0
01 Oct 2024
Incorporating Spatial Cues in Modular Speaker Diarization for
  Multi-channel Multi-party Meetings
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
42
1
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
31
0
0
25 Sep 2024
1234...202122
Next