ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset
v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown
Title
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker Detection
Joseph Roth
Sourish Chaudhuri
Ondˇrej Klejch
Radhika Marvin
Andrew C. Gallagher
...
S. Ramaswamy
Arkadiusz Stopczynski
Cordelia Schmid
Zhonghua Xi
C. Pantofaru
84
145
0
05 Jan 2019
Speech and Speaker Recognition from Raw Waveform with SincNet
Speech and Speaker Recognition from Raw Waveform with SincNet
Mirco Ravanelli
Yoshua Bengio
56
30
0
13 Dec 2018
Theoretical Guarantees of Deep Embedding Losses Under Label Noise
Theoretical Guarantees of Deep Embedding Losses Under Label Noise
Nam Le
J. Odobez
NoLa
23
1
0
06 Dec 2018
TwoStreamVAN: Improving Motion Modeling in Video Generation
TwoStreamVAN: Improving Motion Modeling in Video Generation
Ximeng Sun
Huijuan Xu
Kate Saenko
DiffMVGen
61
17
0
03 Dec 2018
Learning Speaker Representations with Mutual Information
Learning Speaker Representations with Mutual Information
Mirco Ravanelli
Yoshua Bengio
SSLDRL
102
91
0
01 Dec 2018
Noise-tolerant Audio-visual Online Person Verification using an
  Attention-based Neural Network Fusion
Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion
Suwon Shon
Tae-Hyun Oh
James R. Glass
59
50
0
27 Nov 2018
Interpretable Convolutional Filters with SincNet
Interpretable Convolutional Filters with SincNet
Mirco Ravanelli
Yoshua Bengio
93
107
0
23 Nov 2018
iQIYI-VID: A Large Dataset for Multi-modal Person Identification
iQIYI-VID: A Large Dataset for Multi-modal Person Identification
Yuanliu Liu
Bo Peng
Peipei Shi
He Yan
Yong Zhou
...
Tingwei Gao
G. Wang
Jian Liu
Xiangju Lu
Danming Xie
77
35
0
19 Nov 2018
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing
  Mimicry Attacks Using Automatic Target Speaker Selection
Can We Use Speaker Recognition Technology to Attack Itself? Enhancing Mimicry Attacks Using Automatic Target Speaker Selection
Tomi Kinnunen
Rosa González Hautamäki
Ville Vestman
Md. Sahidullah
70
5
0
09 Nov 2018
Who Do I Sound Like? Showcasing Speaker Recognition Technology by
  YouTube Voice Search
Who Do I Sound Like? Showcasing Speaker Recognition Technology by YouTube Voice Search
R. Krishnan
Bilal Soomro
Mahesh Subedar
Ville Hautamaki
Tomi Kinnunen
103
5
0
08 Nov 2018
Gaussian-Constrained training for speaker verification
Gaussian-Constrained training for speaker verification
Lantian Li
Zhiyuan Tang
Ying Shi
Dong Wang
58
26
0
08 Nov 2018
Adapting End-to-End Neural Speaker Verification to New Languages and
  Recording Conditions with Adversarial Training
Adapting End-to-End Neural Speaker Verification to New Languages and Recording Conditions with Adversarial Training
Christoph Dann
Lihong Li
Wei Wei
86
39
0
07 Nov 2018
Building Corpora for Single-Channel Speech Separation Across Multiple
  Domains
Building Corpora for Single-Channel Speech Separation Across Multiple Domains
Aman Rana
Gregory Sell
Leibny Paola García Perera
A. Lowe
Pratik Shah
64
10
0
06 Nov 2018
How to Improve Your Speaker Embeddings Extractor in Generic Toolkits
How to Improve Your Speaker Embeddings Extractor in Generic Toolkits
Christopher Snyder
Lukás Burget
S. Vishwanath
Themos Stafylakis
Jan Cernocky
80
51
0
05 Nov 2018
Deep Segment Attentive Embedding for Duration Robust Speaker
  Verification
Deep Segment Attentive Embedding for Duration Robust Speaker Verification
Bin Liu
Shuai Nie
Yaping Zhang
Shan Liang
Wenju Liu
52
4
0
01 Nov 2018
Deep Net Features for Complex Emotion Recognition
Bhalaji Nagarajan
V. R. M. Oruganti
23
3
0
31 Oct 2018
Deep Learning as Feature Encoding for Emotion Recognition
Bhalaji Nagarajan
V. R. M. Oruganti
26
1
0
30 Oct 2018
Short utterance compensation in speaker verification via cosine-based
  teacher-student learning of speaker embeddings
Short utterance compensation in speaker verification via cosine-based teacher-student learning of speaker embeddings
Jee-weon Jung
Hee-Soo Heo
Hye-jin Shim
Ha-Jin Yu
78
37
0
25 Oct 2018
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned
  Spectrogram Masking
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Quan Wang
Hannah Muckenhirn
K. Wilson
Prashant Sridhar
Zelin Wu
J. Hershey
Rif A. Saurous
Ron J. Weiss
Ye Jia
Ignacio López Moreno
127
370
0
11 Oct 2018
Fully Supervised Speaker Diarization
Fully Supervised Speaker Diarization
Aonan Zhang
Quan Wang
Zhenyao Zhu
John Paisley
Chong-Jun Wang
BDL
142
218
0
10 Oct 2018
Attention Mechanism in Speaker Recognition: What Does It Learn in Deep
  Speaker Embedding?
Attention Mechanism in Speaker Recognition: What Does It Learn in Deep Speaker Embedding?
Qiongqiong Wang
K. Okabe
Kong Aik Lee
Hitoshi Yamamoto
Takafumi Koshinaka
60
31
0
25 Sep 2018
Unsupervised Representation Learning of Speech for Dialect
  Identification
Unsupervised Representation Learning of Speech for Dialect Identification
Suwon Shon
Wei-Ning Hsu
James R. Glass
43
13
0
12 Sep 2018
Frame-level speaker embeddings for text-independent speaker recognition
  and analysis of end-to-end model
Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model
Suwon Shon
Hao Tang
James R. Glass
62
88
0
12 Sep 2018
One-Shot Speaker Identification for a Service Robot using a CNN-based
  Generic Verifier
One-Shot Speaker Identification for a Service Robot using a CNN-based Generic Verifier
I. Vélez
C. Rascón
Gibran Fuentes Pineda
30
7
0
11 Sep 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
66
174
0
07 Sep 2018
Self-supervised learning of a facial attribute embedding from video
Self-supervised learning of a facial attribute embedding from video
Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
CVBMSSL
86
133
0
21 Aug 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
81
272
0
16 Aug 2018
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device
  Text-Independent Speaker Verification
Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification
Sobhan Soleymani
Ali Dabouei
Seyed Mehdi Iranmanesh
Hadi Kazemi
J. Dawson
Nasser M. Nasrabadi
57
18
0
31 Jul 2018
Speaker Recognition from Raw Waveform with SincNet
Speaker Recognition from Raw Waveform with SincNet
Mirco Ravanelli
Yoshua Bengio
203
724
0
29 Jul 2018
X2Face: A network for controlling face generation by using images,
  audio, and pose codes
X2Face: A network for controlling face generation by using images, audio, and pose codes
Olivia Wiles
A. Sophia Koepke
Andrew Zisserman
CVBM
96
416
0
27 Jul 2018
Unified Hypersphere Embedding for Speaker Recognition
Unified Hypersphere Embedding for Speaker Recognition
Mahdi Hajibabaei
Dengxin Dai
73
86
0
22 Jul 2018
Talking Face Generation by Adversarially Disentangled Audio-Visual
  Representation
Talking Face Generation by Adversarially Disentangled Audio-Visual Representation
Hang Zhou
Yu Liu
Ziwei Liu
Ping Luo
Xiaogang Wang
CVBM
94
443
0
20 Jul 2018
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Disjoint Mapping Network for Cross-modal Matching of Voices and Faces
Yandong Wen
Mahmoud Al Ismail
Weiyang Liu
Bhiksha Raj
Rita Singh
FedML
59
71
0
12 Jul 2018
Detection and Analysis of Content Creator Collaborations in YouTube
  Videos using Face- and Speaker-Recognition
Detection and Analysis of Content Creator Collaborations in YouTube Videos using Face- and Speaker-Recognition
Moritz Lode
Michael Örtl
Christian Koch
Amr Rizk
R. Steinmetz
CVBM
21
1
0
05 Jul 2018
Weakly Supervised Training of Speaker Identification Models
Weakly Supervised Training of Speaker Identification Models
Mart Karu
Tanel Alumäe
42
10
0
22 Jun 2018
Unsupervised Learning of Object Landmarks through Conditional Image
  Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Tomas Jakab
Ankush Gupta
Hakan Bilen
Andrea Vedaldi
SSL
105
253
0
20 Jun 2018
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
368
2,289
0
14 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
270
838
0
12 Jun 2018
Analysis of Length Normalization in End-to-End Speaker Verification
  System
Analysis of Length Normalization in End-to-End Speaker Verification System
Weicheng Cai
Jinkun Chen
Ming Li
VLM
61
39
0
08 Jun 2018
Speaker Clustering Using Dominant Sets
Speaker Clustering Using Dominant Sets
Feliks Hibraj
Sebastiano Vascon
Thilo Stadelmann
Marcello Pelillo
22
4
0
21 May 2018
Sparse Architectures for Text-Independent Speaker Verification Using
  Deep Neural Networks
Sparse Architectures for Text-Independent Speaker Verification Using Deep Neural Networks
Sara Sedighi
Shayan Ramhormozi
16
0
0
19 May 2018
On Learning Associations of Faces and Voices
On Learning Associations of Faces and Voices
Changil Kim
Hijung Valentina Shin
Tae-Hyun Oh
Alexandre Kaspar
Mohamed A. Elgharib
Wojciech Matusik
CVBM
90
84
0
15 May 2018
Supervector Compression Strategies to Speed up I-Vector System
  Development
Supervector Compression Strategies to Speed up I-Vector System Development
Ville Vestman
Tomi Kinnunen
61
3
0
03 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
138
141
0
02 May 2018
End-to-End Residual CNN with L-GM Loss Speaker Verification System
End-to-End Residual CNN with L-GM Loss Speaker Verification System
Xuan Shi
Xingjian Du
Mengyao Zhu
32
5
0
02 May 2018
A Deep Network for Arousal-Valence Emotion Prediction with
  Acoustic-Visual Cues
A Deep Network for Arousal-Valence Emotion Prediction with Acoustic-Visual Cues
Songyou Peng
Le Zhang
Yutong Ban
Mengsha Fang
Stefan Winkler
94
25
0
02 May 2018
Text-Independent Speaker Verification Using Long Short-Term Memory
  Networks
Text-Independent Speaker Verification Using Long Short-Term Memory Networks
Aryan Mobiny
Mohammad Najarian
69
16
0
02 May 2018
Collaborations on YouTube: From Unsupervised Detection to the Impact on
  Video and Channel Popularity
Collaborations on YouTube: From Unsupervised Detection to the Impact on Video and Channel Popularity
Christian Koch
Moritz Lode
Denny Stohr
Amr Rizk
R. Steinmetz
11
4
0
01 May 2018
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and
  Language Recognition System
Exploring the Encoding Layer and Loss Function in End-to-End Speaker and Language Recognition System
Weicheng Cai
Jinkun Chen
Ming Li
68
332
0
14 Apr 2018
Talking Face Generation by Conditional Recurrent Adversarial Network
Talking Face Generation by Conditional Recurrent Adversarial Network
Yang Song
Jingwen Zhu
Dawei Li
Xiaolong Wang
Hairong Qi
CVBM
177
196
0
13 Apr 2018
Previous
123...212223
Next