ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset
v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown
Title
Style-Preserving Lip Sync via Audio-Aware Style Reference
Style-Preserving Lip Sync via Audio-Aware Style Reference
Weizhi Zhong
Jichang Li
Yinqi Cai
Ming Li
Guanbin Li
Liang Lin
G. Li
87
2
0
10 Aug 2024
Synchronous Multi-modal Semantic Communication System with Packet-level
  Coding
Synchronous Multi-modal Semantic Communication System with Packet-level Coding
Yun Tian
Jingkai Ying
Zhijin Qin
Ye Jin
Xiaoming Tao
72
6
0
08 Aug 2024
Automatic Voice Identification after Speech Resynthesis using PPG
Automatic Voice Identification after Speech Resynthesis using PPG
Thibault Gaudier
Marie Tahon
Anthony Larcher
Yannick Esteve
65
0
0
05 Aug 2024
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face
  Association
Contrastive Learning-based Chaining-Cluster for Multilingual Voice-Face Association
Wuyang Chen
Yanjie Sun
Kele Xu
Yong Dou
CVBM
77
0
0
04 Aug 2024
Resilience and Security of Deep Neural Networks Against Intentional and
  Unintentional Perturbations: Survey and Research Challenges
Resilience and Security of Deep Neural Networks Against Intentional and Unintentional Perturbations: Survey and Research Challenges
Sazzad Sayyed
Milin Zhang
Shahriar Rifat
A. Swami
Michael De Lucia
Francesco Restuccia
106
1
0
31 Jul 2024
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech
  Processing Tasks
ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks
Nakamasa Inoue
Shinta Otake
Takumi Hirose
Masanari Ohi
Rei Kawakami
76
2
0
28 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant
  Automatic Speech Recognition and Diarization
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Sanjeev Khudanpur
Paola García
Shinji Watanabe
84
13
0
23 Jul 2024
Chameleon: Images Are What You Need For Multimodal Learning Robust To
  Missing Modalities
Chameleon: Images Are What You Need For Multimodal Learning Robust To Missing Modalities
Muhammad Irzam Liaqat
Shah Nawaz
Muhammad Zaigham Zaheer
M. S. Saeed
Hassan Sajjad
Tom De Schepper
Karthik Nandakumar
Muhammad Haris Khan
96
1
0
23 Jul 2024
Overview of Speaker Modeling and Its Applications: From the Lens of Deep
  Speaker Representation Learning
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
112
6
0
21 Jul 2024
Linear-Complexity Self-Supervised Learning for Speech Processing
Linear-Complexity Self-Supervised Learning for Speech Processing
Shucong Zhang
Titouan Parcollet
Rogier van Dalen
Sourav Bhattacharya
122
1
0
18 Jul 2024
Learning Online Scale Transformation for Talking Head Video Generation
Learning Online Scale Transformation for Talking Head Video Generation
Fa-Ting Hong
Dan Xu
96
1
0
13 Jul 2024
Phonetic Richness for Improved Automatic Speaker Verification
Phonetic Richness for Improved Automatic Speaker Verification
Nicholas Klein
Ganesh Sivaraman
Elie Khoury
70
0
0
10 Jul 2024
Analyzing Speech Unit Selection for Textless Speech-to-Speech
  Translation
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation
J. Duret
Yannick Esteve
Titouan Parcollet
109
0
0
08 Jul 2024
A Benchmark for Multi-speaker Anonymization
A Benchmark for Multi-speaker Anonymization
Xiaoxiao Miao
Ruijie Tao
Chang Zeng
Xin Wang
99
1
0
08 Jul 2024
GMM-ResNext: Combining Generative and Discriminative Models for Speaker
  Verification
GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification
Hui Yan
Zhenchun Lei
Changhong Liu
Yong Zhou
55
2
0
03 Jul 2024
Probing the Feasibility of Multilingual Speaker Anonymization
Probing the Feasibility of Multilingual Speaker Anonymization
Sarina Meyer
Florian Lux
Ngoc Thang Vu
116
4
0
03 Jul 2024
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control
Jianzhu Guo
Dingyun Zhang
Xiaoqiang Liu
Zhizhou Zhong
Yuan Zhang
Pengfei Wan
Di Zhang
VGen
132
71
0
03 Jul 2024
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight
  Conv-TasNet and State Space Modeling
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling
Hiroshi Sato
Takafumi Moriya
Masato Mimura
Shota Horiguchi
Tsubasa Ochiai
Takanori Ashihara
Atsushi Ando
Kentaro Shinayama
Marc Delcroix
72
2
0
01 Jul 2024
Leveraging Speaker Embeddings in End-to-End Neural Diarization for
  Two-Speaker Scenarios
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
Juan Ignacio Alvarez-Trejos
Beltrán Labrador
Alicia Lozano-Diez
95
2
0
01 Jul 2024
SecureSpectra: Safeguarding Digital Identity from Deep Fake Threats via
  Intelligent Signatures
SecureSpectra: Safeguarding Digital Identity from Deep Fake Threats via Intelligent Signatures
Oguzhan Baser
Kaan Kale
Sandeep Chinchali
57
0
0
01 Jul 2024
An Attribute Interpolation Method in Speech Synthesis by Model Merging
An Attribute Interpolation Method in Speech Synthesis by Model Merging
Masato Murata
Koichi Miyazaki
Tomoki Koriyama
MoMe
115
6
0
30 Jun 2024
Application of ASV for Voice Identification after VC and Duration
  Predictor Improvement in TTS Models
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models
Borodin Kirill Nikolayevich
Kudryavtsev Vasiliy Dmitrievich
Mkrtchian Grach Maratovich
Gorodnichev Mikhail Genadievich
Korzh Dmitrii Sergeevich
64
0
0
27 Jun 2024
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech
  Health Diagnostic Model
WavRx: a Disease-Agnostic, Generalizable, and Privacy-Preserving Speech Health Diagnostic Model
Yi Zhu
Tiago H. Falk
MedIm
82
1
0
26 Jun 2024
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D
  Facial Prior-guided Identity Alignment Network
RealTalk: Real-time and Realistic Audio-driven Face Generation with 3D Facial Prior-guided Identity Alignment Network
Xiaozhong Ji
Chuming Lin
Zhonggan Ding
Ying Tai
Junwei Zhu
Xiaobin Hu
Donghao Luo
Yanhao Ge
Chengjie Wang
CVBM
53
2
0
26 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Siyang Song
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLMELMLM&MA
169
35
0
23 Jun 2024
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot
  Speaker Adaptive Text-to-Speech
GLOBE: A High-quality English Corpus with Global Accents for Zero-shot Speaker Adaptive Text-to-Speech
Wenbin Wang
Yang Song
Sanjay Jha
109
10
0
21 Jun 2024
DASB -- Discrete Audio and Speech Benchmark
DASB -- Discrete Audio and Speech Benchmark
Pooneh Mousavi
Luca Della Libera
J. Duret
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
96
21
0
20 Jun 2024
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with
  Multilingual Video Dataset
MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset
Kim Sung-Bin
Lee Chae-Yeon
Gihun Son
Oh Hyun-Bin
Janghoon Ju
Suekyeong Nam
Tae-Hyun Oh
92
12
0
20 Jun 2024
AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric
  Conditioned Diffusion Models
AniFaceDiff: High-Fidelity Face Reenactment via Facial Parametric Conditioned Diffusion Models
Ken Chen
Sachith Seneviratne
Wei Wang
Dongting Hu
Sanjay Saha
Md. Tarek Hasan
Sanka Rasnayaka
T. Malepathirana
Mingming Gong
Saman K. Halgamuge
DiffM
32
2
0
19 Jun 2024
CEC: A Noisy Label Detection Method for Speaker Recognition
CEC: A Noisy Label Detection Method for Speaker Recognition
Yao Shen
Yingying Gao
Yaqian Hao
Chenguang Hu
Fulin Zhang
Junlan Feng
Shilei Zhang
NoLa
52
0
0
19 Jun 2024
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Articulatory Encodec: Coding Speech through Vocal Tract Kinematics
Cheol Jun Cho
Peter Wu
Tejas S. Prabhune
Dhruv Agarwal
Gopala K. Anumanchipalli
110
8
0
18 Jun 2024
Self-Distillation Prototypes Network: Learning Robust Speaker
  Representations without Supervision
Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision
Yafeng Chen
Siqi Zheng
Hui Wang
Luyao Cheng
Qian Chen
Shiliang Zhang
Wen Wang
SSL
61
4
0
17 Jun 2024
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
How Should We Extract Discrete Audio Tokens from Self-Supervised Models?
Pooneh Mousavi
J. Duret
Salah Zaiem
Luca Della Libera
Artem Ploujnikov
Cem Subakan
Mirco Ravanelli
99
15
0
15 Jun 2024
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition
  Challenge
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge
Chen Chen
Zehua Liu
Xiaolou Li
Lantian Li
D. Wang
66
4
0
14 Jun 2024
End-to-end Streaming model for Low-Latency Speech Anonymization
End-to-end Streaming model for Low-Latency Speech Anonymization
Waris Quamer
Ricardo Gutierrez-Osuna
96
0
0
13 Jun 2024
Exploring Speech Foundation Models for Speaker Diarization in
  Child-Adult Dyadic Interactions
Exploring Speech Foundation Models for Speaker Diarization in Child-Adult Dyadic Interactions
Anfeng Xu
Kevin Huang
Tiantian Feng
Lue Shen
Helen Tager-Flusberg
Shrikanth Narayanan
57
4
0
12 Jun 2024
A Comprehensive Investigation on Speaker Augmentation for Speaker
  Recognition
A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition
Zhenyu Zhou
Shibiao Xu
Shi Yin
Lantian Li
D. Wang
62
2
0
11 Jun 2024
MR-RawNet: Speaker verification system with multiple temporal
  resolutions for variable duration utterances using raw waveforms
MR-RawNet: Speaker verification system with multiple temporal resolutions for variable duration utterances using raw waveforms
Seung-bin Kim
Chan-yeong Lim
Jungwoo Heo
Ju-ho Kim
Hyun-Seo Shin
Kyo-Won Koo
Ha-Jin Yu
85
0
0
11 Jun 2024
Scaling up masked audio encoder learning for general audio
  classification
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
96
7
0
11 Jun 2024
Source -Free Domain Adaptation for Speaker Verification in Data-Scarce
  Languages and Noisy Channels
Source -Free Domain Adaptation for Speaker Verification in Data-Scarce Languages and Noisy Channels
Shlomo Salo Elia
Aviad Malachi
V. Aharonson
Gadi Pinkas
86
0
0
09 Jun 2024
Towards Lightweight Speaker Verification via Adaptive Neural Network
  Quantization
Towards Lightweight Speaker Verification via Adaptive Neural Network Quantization
Bei Liu
Haoyu Wang
Yanmin Qian
MQ
76
1
0
08 Jun 2024
To what extent can ASV systems naturally defend against spoofing
  attacks?
To what extent can ASV systems naturally defend against spoofing attacks?
Jee-weon Jung
Xin Eric Wang
Nicholas W. D. Evans
Shinji Watanabe
Hye-jin Shim
Hemlata Tak
Sidhhant Arora
Junichi Yamagishi
Joon Son Chung
AAML
89
5
0
08 Jun 2024
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Xuanjun Chen
Jiawei Du
Haibin Wu
Jyh-Shing Roger Jang
Hung-yi Lee
73
3
0
07 Jun 2024
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech
  Activity Detection and Speaker Gender Segmentation
InaGVAD : a Challenging French TV and Radio Corpus Annotated for Speech Activity Detection and Speaker Gender Segmentation
D. Doukhan
Christine Maertens
William Le Personnic
Ludovic Speroni
Reda Dehak
113
2
0
06 Jun 2024
Hypernetworks for Personalizing ASR to Atypical Speech
Hypernetworks for Personalizing ASR to Atypical Speech
Max Müller-Eberstein
Dianna Yee
Karren D. Yang
G. Mantena
Colin S. Lea
92
2
0
06 Jun 2024
Towards Supervised Performance on Speaker Verification with
  Self-Supervised Learning by Leveraging Large-Scale ASR Models
Towards Supervised Performance on Speaker Verification with Self-Supervised Learning by Leveraging Large-Scale ASR Models
Victor Miara
Theo Lepage
Reda Dehak
80
1
0
04 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose
  Audio-Language Representation
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Masahiro Yasuda
Shunsuke Tsubaki
Keisuke Imoto
VLM
95
7
0
04 Jun 2024
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
ComFace: Facial Representation Learning with Synthetic Data for Comparing Faces
Yusuke Akamatsu
Terumi Umematsu
Hitoshi Imaoka
Shizuko Gomi
Hideo Tsurushima
156
0
0
25 May 2024
Non-autoregressive real-time Accent Conversion model with voice cloning
Non-autoregressive real-time Accent Conversion model with voice cloning
Vladimir Nechaev
Sergey Kosyakov
81
1
0
21 May 2024
Neighborhood Attention Transformer with Progressive Channel Fusion for
  Speaker Verification
Neighborhood Attention Transformer with Progressive Channel Fusion for Speaker Verification
Nian Li
Jianguo Wei
ViT
71
0
0
20 May 2024
Previous
123456...212223
Next