ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,098 papers shown
Title
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
66
14
0
19 Sep 2023
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks
Zeyang Song
Jibin Wu
Malu Zhang
Mike Zheng Shou
Haizhou Li
45
4
0
18 Sep 2023
A Generative Framework for Self-Supervised Facial Representation
  Learning
A Generative Framework for Self-Supervised Facial Representation Learning
Ruian He
Zhen Xing
Weimin Tan
Bo Yan
DiffM
26
0
0
15 Sep 2023
DiariST: Streaming Speech Translation with Speaker Diarization
DiariST: Streaming Speech Translation with Speaker Diarization
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Junkun Chen
Peidong Wang
Jian Xue
Jinyu Li
Takuya Yoshioka
32
6
0
14 Sep 2023
SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker
  Recognition Systems
SLMIA-SR: Speaker-Level Membership Inference Attacks against Speaker Recognition Systems
Guangke Chen
Yedi Zhang
Fu Song
43
8
0
14 Sep 2023
Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker
  Verification Using Score-Based Diffusion Probabilistic Models
Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Ju-ho Kim
Ju-Sung Heo
Hyun-Seo Shin
Chanmann Lim
Ha-Jin Yu
DiffM
24
2
0
14 Sep 2023
Codec Data Augmentation for Time-domain Heart Sound Classification
Codec Data Augmentation for Time-domain Heart Sound Classification
Ansh Mishra
J. Yip
Chng Eng Siong
15
1
0
14 Sep 2023
Getting More for Less: Using Weak Labels and AV-Mixup for Robust
  Audio-Visual Speaker Verification
Getting More for Less: Using Weak Labels and AV-Mixup for Robust Audio-Visual Speaker Verification
Anith Selvakumar
H. Fashandi
VLM
29
0
0
13 Sep 2023
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
SynVox2: Towards a privacy-friendly VoxCeleb2 dataset
Xiaoxiao Miao
Xin Eric Wang
Erica Cooper
Junichi Yamagishi
Nicholas W. D. Evans
Massimiliano Todisco
J. Bonastre
Mickael Rouvier
22
5
0
12 Sep 2023
MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment
MaskRenderer: 3D-Infused Multi-Mask Realistic Face Reenactment
Tina Behrouzi
Atefeh Shahroudnejad
Payam Mousavi
CVBM
17
0
0
10 Sep 2023
Voice Morphing: Two Identities in One Voice
Voice Morphing: Two Identities in One Voice
Sushant Pani
Anurag Chowdhury
Morgan Sandler
Arun Ross
32
1
0
05 Sep 2023
Acoustic-to-articulatory inversion for dysarthric speech: Are
  pre-trained self-supervised representations favorable?
Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?
Sarthak Kumar Maharana
Krishna Kamal Adidam
Shoumik Nandi
Ajitesh Srivastava
35
2
0
03 Sep 2023
From Pixels to Portraits: A Comprehensive Survey of Talking Head
  Generation Techniques and Applications
From Pixels to Portraits: A Comprehensive Survey of Talking Head Generation Techniques and Applications
Shreyank N. Gowda
Dheeraj Pandey
Shashank Narayana Gowda
54
3
0
30 Aug 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
33
0
0
30 Aug 2023
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge
Ruoyu Wang
Maokui He
Jun Du
Hengshun Zhou
Shutong Niu
...
Mengzhi Wang
Genshun Wan
Jia Pan
Jianqing Gao
Chin-Hui Lee
35
12
0
28 Aug 2023
Speech Self-Supervised Representations Benchmarking: a Case for Larger
  Probing Heads
Speech Self-Supervised Representations Benchmarking: a Case for Larger Probing Heads
Salah Zaiem
Youcef Kemiche
Titouan Parcollet
S. Essid
Mirco Ravanelli
SSL
29
11
0
28 Aug 2023
Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Unified and Dynamic Graph for Temporal Character Grouping in Long Videos
Xiujun Shu
Wei Wen
Liangsheng Xu
Ruizhi Qiao
Taian Guo
Hanjun Li
Bei Gan
Tianlin Li
Xing Sun
42
0
0
27 Aug 2023
Fairness and Privacy in Voice Biometrics:A Study of Gender Influences
  Using wav2vec 2.0
Fairness and Privacy in Voice Biometrics:A Study of Gender Influences Using wav2vec 2.0
Oubaïda Chouchane
Michele Panariello
Chiara Galdi
Massimiliano Todisco
Nicholas W. D. Evans
32
2
0
27 Aug 2023
ToonTalker: Cross-Domain Face Reenactment
ToonTalker: Cross-Domain Face Reenactment
Yuan Gong
Yong Zhang
Xiaodong Cun
Fei Yin
Yanbo Fan
Xuanxia Wang
Baoyuan Wu
Yujiu Yang
CVBM
39
7
0
24 Aug 2023
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with
  Diffusion
DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion
Se Jin Park
Joanna Hong
Minsu Kim
Y. Ro
37
4
0
23 Aug 2023
An Effective Transformer-based Contextual Model and Temporal Gate
  Pooling for Speaker Identification
An Effective Transformer-based Contextual Model and Temporal Gate Pooling for Speaker Identification
Harunori Kawano
Sota Shimizu
30
1
0
22 Aug 2023
A Survey on Deep Multi-modal Learning for Body Language Recognition and
  Generation
A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation
Li Liu
Lufei Gao
Wen-Ling Lei
Fengji Ma
Xiaotian Lin
Jin-Tao Wang
CVBM
27
5
0
17 Aug 2023
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided
  Speaker Embedding
DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding
J. Choi
Joanna Hong
Y. Ro
DiffM
29
19
0
15 Aug 2023
Integrating Emotion Recognition with Speech Recognition and Speaker
  Diarisation for Conversations
Integrating Emotion Recognition with Speech Recognition and Speaker Diarisation for Conversations
Wen Wu
C. Zhang
P. Woodland
31
3
0
14 Aug 2023
VoxBlink: A Large Scale Speaker Verification Dataset on Camera
VoxBlink: A Large Scale Speaker Verification Dataset on Camera
Yuke Lin
Xiaoyi Qin
Guoqing Zhao
Ming Cheng
Ning Jiang
Haiying Wu
Ming Li
49
15
0
14 Aug 2023
Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD
  Space
Versatile Face Animator: Driving Arbitrary 3D Facial Avatar in RGBD Space
Haoyu Wang
Haozhe Wu
Junliang Xing
Jia Jia
3DH
25
4
0
11 Aug 2023
Speaker Recognition Using Isomorphic Graph Attention Network Based
  Pooling on Self-Supervised Representation
Speaker Recognition Using Isomorphic Graph Attention Network Based Pooling on Self-Supervised Representation
Zirui Ge
Xinzhou Xu
Haiyan Guo
Tingting Wang
Zhen Yang
SSL
26
1
0
09 Aug 2023
Breaking Speaker Recognition with PaddingBack
Breaking Speaker Recognition with PaddingBack
Zhe Ye
Diqun Yan
Li Dong
Kailai Shen
AAML
39
2
0
08 Aug 2023
Audio-visual video-to-speech synthesis with synthesized input audio
Audio-visual video-to-speech synthesis with synthesized input audio
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
DiffM
38
1
0
31 Jul 2023
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based
  onFlexible Location Gradient Reversal Layer
On-Device Speaker Anonymization of Acoustic Embeddings for ASR based onFlexible Location Gradient Reversal Layer
Md. Asif Jalal
Pablo Peso Parada
Jisi Zhang
Karthikeyan P. Saravanan
Mete Ozay
Myoungji Han
Jung In Lee
Seokyeong Jung
28
1
0
25 Jul 2023
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and
  Retarget Faces
HyperReenact: One-Shot Reenactment via Jointly Learning to Refine and Retarget Faces
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
CVBM
36
35
0
20 Jul 2023
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust
  Speaker Verification
PAS: Partial Additive Speech Data Augmentation Method for Noise Robust Speaker Verification
Wonbin Kim
Hyun-Seo Shin
Ju-ho Kim
Ju-Sung Heo
Chanmann Lim
Ha-Jin Yu
26
0
0
20 Jul 2023
Implicit Identity Representation Conditioned Memory Compensation Network
  for Talking Head video Generation
Implicit Identity Representation Conditioned Memory Compensation Network for Talking Head video Generation
Fa-Ting Hong
Dan Xu
CVBM
25
31
0
19 Jul 2023
Exploring Binary Classification Loss For Speaker Verification
Exploring Binary Classification Loss For Speaker Verification
Bing Han
Zhengyang Chen
Y. Qian
CVBM
27
10
0
17 Jul 2023
Representation Learning With Hidden Unit Clustering For Low Resource
  Speech Applications
Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications
Varun Krishna
T. Sai
Sriram Ganapathy
SSL
32
2
0
14 Jul 2023
Retrieving Continuous Time Event Sequences using Neural Temporal Point
  Processes with Learnable Hashing
Retrieving Continuous Time Event Sequences using Neural Temporal Point Processes with Learnable Hashing
Vinayak Gupta
Srikanta J. Bedathur
A. De
AI4TS
27
1
0
13 Jul 2023
LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via
  Latent Ensemble Attack
LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack
Joonkyo Shim
H. Yoon
DiffM
AAML
19
2
0
04 Jul 2023
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via
  Adversarial Ultrasound
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound
Xinfeng Li
Junning Ze
Chen Yan
Yushi Cheng
Xiaoyu Ji
Wenyuan Xu
AAML
31
11
0
28 Jun 2023
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice
  Conversion
Fake the Real: Backdoor Attack on Deep Speech Classification via Voice Conversion
Zhe Ye
Terui Mao
Li Dong
Diqun Yan
AAML
30
7
0
28 Jun 2023
Large-scale unsupervised audio pre-training for video-to-speech
  synthesis
Large-scale unsupervised audio pre-training for video-to-speech synthesis
Triantafyllos Kefalas
Yannis Panagakis
Maja Pantic
VGen
40
3
0
27 Jun 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and
  Multi-Dialect Corpus for Speech Representation Disentanglement
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Siqi Zheng
Luyao Cheng
Yafeng Chen
Haibo Wang
Qian Chen
27
18
0
27 Jun 2023
Factors Affecting the Performance of Automated Speaker Verification in
  Alzheimer's Disease Clinical Trials
Factors Affecting the Performance of Automated Speaker Verification in Alzheimer's Disease Clinical Trials
Malikeh Ehghaghi
Marija Stanojevic
Ali Akram
Jekaterina Novikova
21
1
0
20 Jun 2023
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Emotional Speech-Driven Animation with Content-Emotion Disentanglement
Radek Danvevcek
Kiran Chhatre
Shashank Tripathi
Yandong Wen
Michael J. Black
Timo Bolkart
21
67
0
15 Jun 2023
When to Use Efficient Self Attention? Profiling Text, Speech and Image
  Transformer Variants
When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants
Anuj Diwan
Eunsol Choi
David Harwath
54
0
0
14 Jun 2023
Parametric Implicit Face Representation for Audio-Driven Facial
  Reenactment
Parametric Implicit Face Representation for Audio-Driven Facial Reenactment
Ricong Huang
Puxiang Lai
Yipeng Qin
Guanbin Li
CVBM
DiffM
35
14
0
13 Jun 2023
Speaker Verification Across Ages: Investigating Deep Speaker Embedding
  Sensitivity to Age Mismatch in Enrollment and Test Speech
Speaker Verification Across Ages: Investigating Deep Speaker Embedding Sensitivity to Age Mismatch in Enrollment and Test Speech
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
26
3
0
13 Jun 2023
IFaceUV: Intuitive Motion Facial Image Generation by Identity
  Preservation via UV map
IFaceUV: Intuitive Motion Facial Image Generation by Identity Preservation via UV map
Han-Lim Lee
Yu-Te Ku
Eunseok Kim
Seungryul Baek
3DH
36
0
0
08 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViT
CLIP
31
26
0
07 Jun 2023
Experimenting with Additive Margins for Contrastive Self-Supervised
  Speaker Verification
Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification
Theo Lepage
Reda Dehak
SSL
21
3
0
06 Jun 2023
Emotional Talking Head Generation based on Memory-Sharing and
  Attention-Augmented Networks
Emotional Talking Head Generation based on Memory-Sharing and Attention-Augmented Networks
Jianrong Wang
Yaxin Zhao
Li Liu
Tian-Shun Xu
Qi Li
Sen Li
24
9
0
06 Jun 2023
Previous
123...567...202122
Next