ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05622
  4. Cited By
VoxCeleb2: Deep Speaker Recognition

VoxCeleb2: Deep Speaker Recognition

14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 773 papers shown
Title
Audio-Driven Talking Face Generation with Diverse yet Realistic Facial
  Animations
Audio-Driven Talking Face Generation with Diverse yet Realistic Facial Animations
Rongliang Wu
Yingchen Yu
Fangneng Zhan
Jiahui Zhang
Xiaoqin Zhang
Shijian Lu
CVBM
24
9
0
18 Apr 2023
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance
  Robust Speaker Verification
Self-Supervised Learning with Cluster-Aware-DINO for High-Performance Robust Speaker Verification
Bing Han
Zhengyang Chen
Y. Qian
11
18
0
12 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
25
2
0
12 Apr 2023
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural
  Radiance Field
One-Shot High-Fidelity Talking-Head Synthesis with Deformable Neural Radiance Field
Weichuang Li
Longhao Zhang
Dong Wang
Bingyan Zhao
Zhigang Wang
Mulin. Chen
Bangze Zhang
Zhongjian Wang
Liefeng Bo
Xuelong Li
3DH
CVBM
19
53
0
11 Apr 2023
Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker
  Audio
Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio
Jenthe Thienpondt
N. Madhu
Kris Demuynck
32
4
0
07 Apr 2023
That's What I Said: Fully-Controllable Talking Face Generation
That's What I Said: Fully-Controllable Talking Face Generation
Youngjoon Jang
Kyeongha Rho
Jong-Bin Woo
Hyeongkeun Lee
Jihwan Park
Youshin Lim
Byeong-Yeol Kim
Joon Son Chung
CVBM
19
9
0
06 Apr 2023
Face Animation with an Attribute-Guided Diffusion Model
Face Animation with an Attribute-Guided Diffusion Model
Bo-Wen Zeng
Xuhui Liu
Sicheng Gao
Boyu Liu
Hong Li
Jianzhuang Liu
Baochang Zhang
42
31
0
06 Apr 2023
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking
  Styles
TalkCLIP: Talking Head Generation with Text-Guided Expressive Speaking Styles
Yifeng Ma
Suzhe Wang
Yu-qiong Ding
Lincheng Li
Bowen Ma
Tangjie Lv
Changjie Fan
Zhipeng Hu
Zhidong Deng
Xin Yu
CLIP
37
21
0
01 Apr 2023
Diff-ID: An Explainable Identity Difference Quantification Framework for
  DeepFake Detection
Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection
Chuer Yu
Xuhong Zhang
Yuxuan Duan
Senbo Yan
Zonghui Wang
Yang Xiang
S. Ji
Wenzhi Chen
AAML
29
5
0
30 Mar 2023
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic
  Supervision
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision
Xubo Liu
Egor Lakomkin
Konstantinos Vougioukas
Pingchuan Ma
Honglie Chen
...
Niko Moritz
J. Kolár
Stavros Petridis
M. Pantic
Christian Fuegen
52
19
0
30 Mar 2023
RobustSwap: A Simple yet Robust Face Swapping Model against Attribute
  Leakage
RobustSwap: A Simple yet Robust Face Swapping Model against Attribute Leakage
Jaeseong Lee
Taewoo Kim
S. Park
Younggun Lee
Jaegul Choo
CVBM
48
2
0
28 Mar 2023
Joint Person Identity, Gender and Age Estimation from Hand Images using Deep Multi-Task Representation Learning
Joint Person Identity, Gender and Age Estimation from Hand Images using Deep Multi-Task Representation Learning
N. L. Baisa
CVBM
40
4
0
27 Mar 2023
CelebV-Text: A Large-Scale Facial Text-Video Dataset
CelebV-Text: A Large-Scale Facial Text-Video Dataset
Jianhui Yu
Hao Zhu
Liming Jiang
Chen Change Loy
Weidong (Tom) Cai
Wayne Wu
30
56
0
26 Mar 2023
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels
Pingchuan Ma
A. Haliassos
Adriana Fernandez-Lopez
Honglie Chen
Stavros Petridis
M. Pantic
27
106
0
25 Mar 2023
MusicFace: Music-driven Expressive Singing Face Synthesis
MusicFace: Music-driven Expressive Singing Face Synthesis
Peng Liu
W. Deng
Hengda Li
Jintai Wang
Yinglin Zheng
Yiwei Ding
Xiaohu Guo
Ming Zeng
CVBM
35
10
0
24 Mar 2023
Learning a 3D Morphable Face Reflectance Model from Low-cost Data
Learning a 3D Morphable Face Reflectance Model from Low-cost Data
Yuxuan Han
Zhibo Wang
Feng Xu
3DH
30
8
0
21 Mar 2023
ModEFormer: Modality-Preserving Embedding for Audio-Video
  Synchronization using Transformers
ModEFormer: Modality-Preserving Embedding for Audio-Video Synchronization using Transformers
Akash Gupta
Rohun Tripathi
Won-Kap Jang
29
6
0
21 Mar 2023
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter
  for Speaker Verification
DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification
Yangfu Li
Jiapan Gan
Xiaodan Lin
24
6
0
20 Mar 2023
Right the docs: Characterising voice dataset documentation practices
  used in machine learning
Right the docs: Characterising voice dataset documentation practices used in machine learning
Kathy Reid
Elizabeth T. Williams
19
2
0
19 Mar 2023
The Graph feature fusion technique for speaker recognition based on
  wav2vec2.0 framework
The Graph feature fusion technique for speaker recognition based on wav2vec2.0 framework
Zirui Ge
Haiyan Guo
Zhen Yang
32
1
0
19 Mar 2023
Style Transfer for 2D Talking Head Animation
Style Transfer for 2D Talking Head Animation
Trong-Thang Pham
Nhat Le
Tuong Khanh Long Do
Hung Nguyen
Erman Tjiputra
Quang-Dieu Tran
A. Nguyen
22
3
0
17 Mar 2023
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D
  Face Animation
MMFace4D: A Large-Scale Multi-Modal 4D Face Dataset for Audio-Driven 3D Face Animation
Haozhe Wu
Jia Jia
Junliang Xing
Hongwei Xu
Xiangyuan Wang
Jelo Wang
CVBM
32
7
0
17 Mar 2023
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual
  Fine-Grained Learning
MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Ruize Xu
Ruoxuan Feng
Shi-Xiong Zhang
Di Hu
36
21
0
09 Mar 2023
WASD: A Wilder Active Speaker Detection Dataset
WASD: A Wilder Active Speaker Detection Dataset
Tiago Roxo
Joana Cabral Costa
Pedro R. M. Inácio
Hugo Manuel Proença
21
3
0
09 Mar 2023
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup
  for Visual Speech Translation and Recognition
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition
Xize Cheng
Lin Li
Tao Jin
Rongjie Huang
Wang Lin
Zehan Wang
Huangdai Liu
Yejin Wang
Aoxiong Yin
Zhou Zhao
26
24
0
09 Mar 2023
Multi-Dimensional and Multi-Scale Modeling for Speech Separation
  Optimized by Discriminative Learning
Multi-Dimensional and Multi-Scale Modeling for Speech Separation Optimized by Discriminative Learning
Zhaoxi Mu
Xinyu Yang
Wenjing Zhu
28
5
0
07 Mar 2023
Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker
  Verification
Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification
Xuechen Liu
Md. Sahidullah
Tomi Kinnunen
37
4
0
02 Mar 2023
DISPLACE Challenge: DIarization of SPeaker and LAnguage in
  Conversational Environments
DISPLACE Challenge: DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Sidharth Sidharth
Ranjana H
Prachi Singh
...
Pratik Roy Chowdhuri
Kaustubh Kulkarni
Swapnil Padhi
Deepu Vijayasenan
Sriram Ganapathy
43
8
0
01 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
25
8
0
01 Mar 2023
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition
  and Robust Speech-to-Text Translation
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
Mohamed Anwar
Bowen Shi
Vedanuj Goswami
Wei-Ning Hsu
J. Pino
Changhan Wang
47
37
0
01 Mar 2023
PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification
PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification
Z. Zhao
Zhuo Li
Wenchao Wang
Pengyuan Zhang
25
23
0
01 Mar 2023
UniFLG: Unified Facial Landmark Generator from Text or Speech
UniFLG: Unified Facial Landmark Generator from Text or Speech
Kentaro Mitsui
Yukiya Hono
Kei Sawada
CVBM
16
6
0
28 Feb 2023
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and
  English
Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English
Xiaoming Ren
Chao Li
Shenjian Wang
Biao Li
38
0
0
28 Feb 2023
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech
Jiyoung Lee
Joon Son Chung
Soo-Whan Chung
DiffM
38
27
0
27 Feb 2023
Speaker Recognition in Realistic Scenario Using Multimodal Data
Speaker Recognition in Realistic Scenario Using Multimodal Data
Saqlain Hussain Shah
M. S. Saeed
Shah Nawaz
Muhammad Haroon Yousaf
CVBM
26
8
0
25 Feb 2023
Towards multi-task learning of speech and speaker recognition
Towards multi-task learning of speech and speaker recognition
Nik Vaessen
David A. van Leeuwen
CVBM
22
0
0
24 Feb 2023
Supervised Hierarchical Clustering using Graph Neural Networks for
  Speaker Diarization
Supervised Hierarchical Clustering using Graph Neural Networks for Speaker Diarization
Prachi Singh
Amrit Kaul
Sriram Ganapathy
BDL
38
8
0
24 Feb 2023
Catch You and I Can: Revealing Source Voiceprint Against Voice
  Conversion
Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion
Jiangyi Deng
Yanjiao Chen
Yinan Zhong
Qianhao Miao
Xueluan Gong
Wenyuan Xu Zhejiang University
32
8
0
24 Feb 2023
A Framework for Unified Real-time Personalized and Non-Personalized
  Speech Enhancement
A Framework for Unified Real-time Personalized and Non-Personalized Speech Enhancement
Zhepei Wang
Ritwik Giri
Devansh P. Shah
J. Valin
Mike Goodwin
Paris Smaragdis
27
8
0
23 Feb 2023
Incorporating Uncertainty from Speaker Embedding Estimation to Speaker
  Verification
Incorporating Uncertainty from Speaker Embedding Estimation to Speaker Verification
Qiongqiong Wang
Kong Aik Lee
Tianchi Liu
UQCV
22
7
0
23 Feb 2023
Cross-modal Audio-visual Co-learning for Text-independent Speaker
  Verification
Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification
Meng Liu
Kong Aik Lee
Longbiao Wang
Hanyi Zhang
Chang Zeng
J. Dang
23
10
0
22 Feb 2023
Interpretable Spectrum Transformation Attacks to Speaker Recognition
Interpretable Spectrum Transformation Attacks to Speaker Recognition
Jiadi Yao
H. Luo
Xiao-Lei Zhang
AAML
32
1
0
21 Feb 2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Jaesung Huh
A. Brown
Jee-weon Jung
Joon Son Chung
Arsha Nagrani
D. Garcia-Romero
Andrew Zisserman
23
26
0
20 Feb 2023
Interactive Face Video Coding: A Generative Compression Framework
Interactive Face Video Coding: A Generative Compression Framework
Bo Chen
Zhao Wang
Binzhe Li
Shurun Wang
Shiqi Wang
Yan Ye
VGen
16
16
0
20 Feb 2023
Improving Transformer-based Networks With Locality For Automatic Speaker
  Verification
Improving Transformer-based Networks With Locality For Automatic Speaker Verification
Mufan Sang
Yong Zhao
Gang Liu
John H. L. Hansen
Jian Wu
ViT
25
14
0
17 Feb 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
40
34
0
10 Feb 2023
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech
  Recognition: the Arman-AV Dataset
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset
J. Peymanfard
Samin Heydarian
Ali Lashini
Hossein Zeinali
Mohammad Reza Mohammadi
N. Mozayani
29
10
0
21 Jan 2023
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset
J. Park
Jung-Wook Hwang
Kwanghee Choi
Seung-Hyun Lee
Jun-Hwan Ahn
R.-H. Park
Hyung-Min Park
29
3
0
16 Jan 2023
Automated speech- and text-based classification of neuropsychiatric
  conditions in a multidiagnostic setting
Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting
L. Hansen
R. Rocca
A. Simonsen
A. Parola
V. Bliksted
...
Dan Bang
Kristian Tylén
Ethan Weed
S. Ostergaard
Riccardo Fusaroli
43
3
0
13 Jan 2023
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng
Ziyang Chen
Andrew Owens
31
71
0
04 Jan 2023
Previous
123...8910...141516
Next