ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset
v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown
Title
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent
  Space
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
CVBM3DH
88
8
0
05 Feb 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
165
24
0
30 Jan 2024
Generalizing Speaker Verification for Spoof Awareness in the Embedding
  Space
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
Xuechen Liu
Md. Sahidullah
K. Lee
Tomi Kinnunen
AAML
128
9
0
20 Jan 2024
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive
  Learning Strategy for Speech Emotion Recognition
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition
Ismail Rasim Ulgen
Zongyang Du
Carlos Busso
Berrak Sisman
55
4
0
19 Jan 2024
Continuous Piecewise-Affine Based Motion Model for Image Animation
Continuous Piecewise-Affine Based Motion Model for Image Animation
Hexiang Wang
Fengqi Liu
Qianyu Zhou
Ran Yi
Xin Tan
Lizhuang Ma
VGen
68
10
0
17 Jan 2024
Deep Learning in Physical Layer: Review on Data Driven End-to-End
  Communication Systems and their Enabling Semantic Applications
Deep Learning in Physical Layer: Review on Data Driven End-to-End Communication Systems and their Enabling Semantic Applications
Nazmul Islam
Seokjoo Shin
AI4CE
106
6
0
08 Jan 2024
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
Danwei Cai
Zexin Cai
Ze Li
Ming Li
64
0
0
03 Jan 2024
A Generalist FaceX via Learning Unified Facial Representation
A Generalist FaceX via Learning Unified Facial Representation
Yue Han
Jiangning Zhang
Junwei Zhu
Xiangtai Li
Yanhao Ge
Wei Li
Chengjie Wang
Yong Liu
Xiaoming Liu
Ying Tai
DiffM
104
13
0
31 Dec 2023
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
T. Dao
D. Vu
Cuong Pham
Anh Tran
87
1
0
28 Dec 2023
Jeffreys divergence-based regularization of neural network output
  distribution applied to speaker recognition
Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition
Pierre-Michel Bousquet
Mickael Rouvier
UQCV
30
2
0
28 Dec 2023
SAIC: Integration of Speech Anonymization and Identity Classification
SAIC: Integration of Speech Anonymization and Identity Classification
Ming Cheng
Xingjian Diao
Shitong Cheng
Wenjun Liu
101
6
0
23 Dec 2023
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from
  their voices
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
Beltrán Labrador
Manuel Otero-Gonzalez
Alicia Lozano-Diez
D. Ramos-Castro
Doroteo T. Toledano
Joaquín González-Rodríguez
73
0
0
20 Dec 2023
Learning Dense Correspondence for NeRF-Based Face Reenactment
Learning Dense Correspondence for NeRF-Based Face Reenactment
Songlin Yang
Wei Wang
Yushi Lan
Xiangyu Fan
Bo Peng
Lei Yang
Jing Dong
CVBM3DH
80
6
0
16 Dec 2023
Efficient speech detection in environmental audio using acoustic
  recognition and knowledge distillation
Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation
Drew Priebe
Burooj Ghani
Dan Stowell
40
5
0
14 Dec 2023
Scalable Ensemble-based Detection Method against Adversarial Attacks for
  speaker verification
Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification
Haibin Wu
Heng-Cheng Kuo
Yu Tsao
Hung-yi Lee
AAML
62
2
0
14 Dec 2023
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for
  Speaker Verification
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
55
12
0
14 Dec 2023
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained
  3D Face Guidance
GSmoothFace: Generalized Smooth Talking Face Generation via Fine Grained 3D Face Guidance
Haiming Zhang
Zhihao Yuan
Chaoda Zheng
Xu Yan
Baoyuan Wang
Guanbin Li
Song Wu
Shuguang Cui
Zhen Li
CVBM
84
1
0
12 Dec 2023
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed
  Speaker Embeddings
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Sung Hwan Mun
Mingrui Han
Canyeong Moon
Nam Soo Kim
80
1
0
11 Dec 2023
Speaker-Text Retrieval via Contrastive Learning
Speaker-Text Retrieval via Contrastive Learning
Xuechen Liu
Xin Wang
Erica Cooper
Xiaoxiao Miao
Junichi Yamagishi
VLM
45
1
0
11 Dec 2023
Neural Concatenative Singing Voice Conversion: Rethinking
  Concatenation-Based Approach for One-Shot Singing Voice Conversion
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Binzhu Sha
Xu Li
Zhiyong Wu
Yin Shan
Helen M. Meng
54
7
0
08 Dec 2023
SingingHead: A Large-scale 4D Dataset for Singing Head Animation
SingingHead: A Large-scale 4D Dataset for Singing Head Animation
Sijing Wu
Yunhao Li
Weitian Zhang
Jun Jia
Yucheng Zhu
Yichao Yan
Guangtao Zhai
Xiaokang Yang
66
2
0
07 Dec 2023
Joint Training or Not: An Exploration of Pre-trained Speech Models in
  Audio-Visual Speaker Diarization
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Huan Zhao
Li Zhang
Yuehong Li
Yannan Wang
Hongji Wang
Wei Rao
Qing Wang
Lei Xie
60
0
0
07 Dec 2023
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker
  Verification
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
Tianchi Liu
Kong Aik Lee
Qiongqiong Wang
Haizhou Li
VLM
146
15
0
06 Dec 2023
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D
  Hybrid Prior
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Xusen Sun
Longhao Zhang
Hao Zhu
Peng Zhang
Bang Zhang
Xinya Ji
Kangneng Zhou
Daiheng Gao
Liefeng Bo
Xun Cao
VGen
101
29
0
04 Dec 2023
DiffSLVA: Harnessing Diffusion Models for Sign Language Video
  Anonymization
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization
Zhaoyang Xia
C. Neidle
Dimitris N. Metaxas
DiffM
70
4
0
27 Nov 2023
Phonetic-aware speaker embedding for far-field speaker verification
Phonetic-aware speaker embedding for far-field speaker verification
Zezhong Jin
Youzhi Tu
Man-Wai Mak
97
1
0
27 Nov 2023
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and
  LAnguage in Conversational Environments
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Somil Jain
Pratik Roy Chowdhuri
Prachi Singh
Deepu Vijayasenan
Sriram Ganapathy
78
7
0
21 Nov 2023
Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging
Video Face Re-Aging: Toward Temporally Consistent Face Re-Aging
Abdul Muqeet
Kyuchul Lee
Bumsoo Kim
Yohan Hong
Hyungrae Lee
Woonggon Kim
Kwang Hee Lee
95
3
0
20 Nov 2023
SponTTS: modeling and transferring spontaneous style for TTS
SponTTS: modeling and transferring spontaneous style for TTS
Hanzhao Li
Xinfa Zhu
Liumeng Xue
Yang Song
Yunlin Chen
Lei Xie
89
7
0
13 Nov 2023
CVTHead: One-shot Controllable Head Avatar with Vertex-feature
  Transformer
CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer
Haoyu Ma
Tong Zhang
Shanlin Sun
Xiangyi Yan
Kun Han
Xiaohui Xie
81
7
0
11 Nov 2023
On the Behavior of Audio-Visual Fusion Architectures in Identity
  Verification Tasks
On the Behavior of Audio-Visual Fusion Architectures in Identity Verification Tasks
Daniel Claborne
Eric Slyman
Karl Pazdernik
49
0
0
09 Nov 2023
CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos
Wasim Ahmad
Yan-Tsung Peng
Yuan-Hao Chang
Gaddisa Olani Ganfure
Sarwar Khan
60
0
0
07 Nov 2023
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency
  for Video Deepfake Detection
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection
Sahibzada Adil Shahzad
Ammarah Hashmi
Yan-Tsung Peng
Yu Tsao
Hsin-Min Wang
96
6
0
05 Nov 2023
Generative Face Video Coding Techniques and Standardization Efforts: A
  Review
Generative Face Video Coding Techniques and Standardization Efforts: A Review
Bo Chen
Jie Chen
Shiqi Wang
Yan Ye
66
15
0
05 Nov 2023
3D-Aware Talking-Head Video Motion Transfer
3D-Aware Talking-Head Video Motion Transfer
Haomiao Ni
Jiachen Liu
Yuan Xue
S. X. Huang
DiffM
136
5
0
05 Nov 2023
LaughTalk: Expressive 3D Talking Head Generation with Laughter
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Kim Sung-Bin
Lee Hyun
Da Hye Hong
Suekyeong Nam
Janghoon Ju
Tae-Hyun Oh
126
23
0
02 Nov 2023
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn
  Supra-Segmental Temporal Features
Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features
Daniel Neururer
Volker Dellwo
Thilo Stadelmann
59
2
0
01 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
86
65
0
30 Oct 2023
Learning Repeatable Speech Embeddings Using An Intra-class Correlation
  Regularizer
Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer
Jianwei Zhang
Suren Jayasuriya
Visar Berisha
SSL
86
2
0
25 Oct 2023
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
HANSEN: Human and AI Spoken Text Benchmark for Authorship Analysis
Nafis Irtiza Tripto
Adaku Uchendu
Thai V. Le
Mattia Setzu
F. Giannotti
Dongwon Lee
DeLMO
58
7
0
25 Oct 2023
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked
  Auto-Encoder
Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder
Huiwon Jang
Jihoon Tack
Daewon Choi
Jongheon Jeong
Jinwoo Shin
74
3
0
25 Oct 2023
Diffusion-Based Adversarial Purification for Speaker Verification
Diffusion-Based Adversarial Purification for Speaker Verification
Yibo Bai
Ju Liu
Xuelong Li
DiffM
72
3
0
22 Oct 2023
Learning Motion Refinement for Unsupervised Face Animation
Learning Motion Refinement for Unsupervised Face Animation
Jiale Tao
Shuhang Gu
Wen Li
Lixin Duan
CVBM3DH
62
4
0
21 Oct 2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's
  DASR System
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
T. Park
He Huang
Ante Jukić
Kunal Dhawan
Krishna C. Puvvada
Nithin Rao Koluguri
Nikolay Karpov
A. Laptev
Jagadeesh Balam
Boris Ginsburg
60
7
0
18 Oct 2023
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling
  Technique for Synthetic Data Generation
Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation
T. Park
He Huang
Coleman Hooper
Nithin Rao Koluguri
Kunal Dhawan
Ante Jukić
Jagadeesh Balam
Boris Ginsburg
69
7
0
18 Oct 2023
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
Yuanyuan Wang
Yang Zhang
Zhiyong Wu
Zhihan Yang
Tao Wei
Kun Zou
Helen M. Meng
85
1
0
18 Oct 2023
LocSelect: Target Speaker Localization with an Auditory Selective
  Hearing Mechanism
LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism
Yu Chen
Xinyuan Qian
Zexu Pan
Kainan Chen
Haizhou Li
55
3
0
16 Oct 2023
Expression Domain Translation Network for Cross-domain Head Reenactment
Expression Domain Translation Network for Cross-domain Head Reenactment
Taewoong Kang
Jeongsik Oh
Jaeseong Lee
S. Park
Jaegul Choo
50
0
0
16 Oct 2023
End-to-end Online Speaker Diarization with Target Speaker Tracking
End-to-end Online Speaker Diarization with Target Speaker Tracking
Weiqing Wang
Ming Li
74
5
0
12 Oct 2023
Cost-Driven Hardware-Software Co-Optimization of Machine Learning
  Pipelines
Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines
Ravit Sharma
W. Romaszkan
Feiqian Zhu
Puneet Gupta
Ankur Mehta
67
0
0
11 Oct 2023
Previous
123456...212223
Next