ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,098 papers shown
Title
Disentangling Age and Identity with a Mutual Information Minimization
  Approach for Cross-Age Speaker Verification
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Fengrun Zhang
Wangjin Zhou
Yiming Liu
Wang Geng
Yahui Shan
Chen Zhang
28
0
0
24 Sep 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target
  Speaker Extraction
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
42
8
0
24 Sep 2024
MIMAFace: Face Animation via Motion-Identity Modulated Appearance
  Feature Learning
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning
Yue Han
Junwei Zhu
Yuxiang Feng
Xiaozhong Ji
Keke He
Xiangtai Li
Zhucun Xue
Yong Liu
26
0
0
23 Sep 2024
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec
  models
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kaiwei Chang
Jiawei Du
...
Yi-Chiao Wu
Xu Tan
James Glass
Shinji Watanabe
Hung-yi Lee
34
6
0
21 Sep 2024
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an
  Expression Foundation Model
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
Feng Qiu
Wei Zhang
Chen Liu
Rudong An
Lincheng Li
Yu Ding
Changjie Fan
Zhipeng Hu
Xin Yu
SLR
3DH
47
0
0
20 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
68
1
0
18 Sep 2024
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Zakaria Aldeneh
Vimal Thilak
Takuya Higuchi
B. Theobald
Tatiana Likhomanenko
SSL
75
0
0
16 Sep 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu
Takahiro Shinozaki
SSL
39
1
0
16 Sep 2024
Speaker Contrastive Learning for Source Speaker Tracing
Speaker Contrastive Learning for Source Speaker Tracing
Qing Wang
Hongmei Guo
Jian Kang
Mengjie Du
Jie Li
Xiao-Lei Zhang
Lei Xie
27
0
0
16 Sep 2024
TBDM-Net: Bidirectional Dense Networks with Gender Information for
  Speech Emotion Recognition
TBDM-Net: Bidirectional Dense Networks with Gender Information for Speech Emotion Recognition
Vlad Striletchi
Cosmin Striletchi
Adriana Stan
46
0
0
16 Sep 2024
Self-Tuning Spectral Clustering for Speaker Diarization
Self-Tuning Spectral Clustering for Speaker Diarization
Nikhil Raghav
Avisek Gupta
Md Sahidullah
Swagatam Das
29
0
0
16 Sep 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
Masao Someki
Kwanghee Choi
Siddhant Arora
William Chen
Samuele Cornell
Jionghao Han
Yifan Peng
Jiatong Shi
Vaibhav Srivastav
Shinji Watanabe
VLM
32
0
0
14 Sep 2024
Channel Adaptation for Speaker Verification Using Optimal Transport with
  Pseudo Label
Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
OT
28
0
0
14 Sep 2024
Integrated Multi-Level Knowledge Distillation for Enhanced Speaker
  Verification
Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Xugang Lu
Lei Li
33
0
0
14 Sep 2024
LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping
  Deformation
LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation
Deng Junli
Luo Yihao
Yang Xueting
Li Siyou
Wang Wei
Guo Jinyang
Shi Ping
26
0
0
14 Sep 2024
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of
  Talking Heads
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Suzhen Wang
Yifeng Ma
Yu Ding
Zhipeng Hu
Changjie Fan
Tangjie Lv
Zhidong Deng
Xin Yu
46
9
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Henry Li Xinyuan
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
37
3
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
41
1
0
13 Sep 2024
FedHide: Federated Learning by Hiding in the Neighbors
FedHide: Federated Learning by Hiding in the Neighbors
Hyunsin Park
Sungrack Yun
FedML
34
0
0
12 Sep 2024
Universal Pooling Method of Multi-layer Features from Pretrained Models
  for Speaker Verification
Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Sung Won Han
SLR
50
0
0
12 Sep 2024
EMOdiffhead: Continuously Emotional Control in Talking Head Generation
  via Diffusion
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Jian Zhang
Weijian Mai
Zhijun Zhang
VGen
40
0
0
11 Sep 2024
Spoofing-Aware Speaker Verification Robust Against Domain and Channel
  Mismatches
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
Chang Zeng
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
AAML
43
0
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
36
1
0
09 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving
  Speaker Anonymization
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Henry Li Xinyuan
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Matthew Wiesner
34
2
0
05 Sep 2024
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
Ryan Whetten
Titouan Parcollet
Adel Moumen
Marco Dinarelli
Yannick Esteve
41
0
0
04 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
57
1
0
04 Sep 2024
Progressive Residual Extraction based Pre-training for Speech
  Representation Learning
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
43
0
0
31 Aug 2024
EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor
  Attacks on Deep Speech Classification Models
EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models
Wenhan Yao
Zedong XingXiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAML
36
0
0
28 Aug 2024
MegActor-$Σ$: Unlocking Flexible Mixed-Modal Control in Portrait
  Animation with Diffusion Transformer
MegActor-ΣΣΣ: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang
Huadong Li
Juhao Wu
Minhao Jing
Linze Li
Renhe Ji
Jiajun Liang
Haoqiang Fan
Jin Wang
VGen
DiffM
46
9
0
27 Aug 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
38
3
0
27 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna C. Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
30
1
0
23 Aug 2024
Meta-Learning in Audio and Speech Processing: An End to End
  Comprehensive Review
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
Athul Raimon
Shubha Masti
Shyam K Sateesh
Siyani Vengatagiri
Bhaskarjyoti Das
VLM
AI4TS
38
1
0
19 Aug 2024
FD2Talk: Towards Generalized Talking Head Generation with Facial
  Decoupled Diffusion Model
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
DiffM
34
3
0
18 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks
  at Scale
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
45
39
0
16 Aug 2024
Supervised and Unsupervised Alignments for Spoofing Behavioral
  Biometrics
Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics
Thomas Thebaud
Gaël Le Lan
Anthony Larcher
AAML
37
0
0
14 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A
  Single-Branch Approach
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Muhammad Saad Saeed
Shah Nawaz
Muhammad Zaigham Zaheer
Muhammad Haris Khan
Karthik Nandakumar
Muhammad Haroon Yousaf
Hassan Sajjad
Tom De Schepper
Markus Schedl
32
0
0
14 Aug 2024
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based
  Diffusion Model
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model
Weizhi Zhong
Junfan Lin
Peixin Chen
Liang Lin
Guanbin Li
42
1
0
10 Aug 2024
Style-Preserving Lip Sync via Audio-Aware Style Reference
Style-Preserving Lip Sync via Audio-Aware Style Reference
Weizhi Zhong
Jichang Li
Yinqi Cai
Liang Lin
Guanbin Li
35
2
0
10 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
46
3
0
08 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
48
0
0
05 Aug 2024
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face
  Association
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
Wuyang Chen
Yanjie Sun
Kele Xu
Yong Dou
CVBM
39
0
0
04 Aug 2024
Resilience and Security of Deep Neural Networks Against Intentional and
  Unintentional Perturbations: Survey and Research Challenges
Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges
Sazzad Sayyed
Milin Zhang
Shahriar Rifat
A. Swami
Michael De Lucia
Francesco Restuccia
40
1
0
31 Jul 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
45
1
0
28 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
39
9
0
23 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
30
1
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
46
1
0
18 Jul 2024
Learning Online Scale Transformation for Talking Head Video Generation
Learning Online Scale Transformation for Talking Head Video Generation
Fa-Ting Hong
Dan Xu
60
1
0
13 Jul 2024
Phonetic Richness for Improved Automatic Speaker Verification
Phonetic Richness for Improved Automatic Speaker Verification
Nicholas Klein
Ganesh Sivaraman
Elie Khoury
34
0
0
10 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
41
0
0
08 Jul 2024
Previous
12345...202122
Next