ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05622
  4. Cited By
VoxCeleb2: Deep Speaker Recognition

VoxCeleb2: Deep Speaker Recognition

14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 759 papers shown
Title
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed
  Speaker Embeddings
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Sung Hwan Mun
Mingrui Han
Canyeong Moon
Nam Soo Kim
36
1
0
11 Dec 2023
Neural Concatenative Singing Voice Conversion: Rethinking
  Concatenation-Based Approach for One-Shot Singing Voice Conversion
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-Based Approach for One-Shot Singing Voice Conversion
Binzhu Sha
Xu Li
Zhiyong Wu
Yin Shan
Helen M. Meng
23
7
0
08 Dec 2023
Joint Training or Not: An Exploration of Pre-trained Speech Models in
  Audio-Visual Speaker Diarization
Joint Training or Not: An Exploration of Pre-trained Speech Models in Audio-Visual Speaker Diarization
Huan Zhao
Li Lyna Zhang
Yuehong Li
Yannan Wang
Hongji Wang
Wei Rao
Qing Wang
Lei Xie
8
0
0
07 Dec 2023
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker
  Verification
Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
Tianchi Liu
Kong Aik Lee
Qiongqiong Wang
Haizhou Li
VLM
68
13
0
06 Dec 2023
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation
  with Unified Audio-Visual Speech Representation
AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
J. Choi
Se Jin Park
Minsu Kim
Y. Ro
30
12
0
05 Dec 2023
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks,
  Methods, and Applications
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Karren D. Yang
Anurag Ranjan
Jen-Hao Rick Chang
Raviteja Vemulapalli
Oncel Tuzel
25
8
0
30 Nov 2023
Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal
  Inconsistencies
Unsupervised Multimodal Deepfake Detection Using Intra- and Cross-Modal Inconsistencies
Mulin Tian
Mahyar Khayatkhoei
Joe Mathai
Wael AbdAlmageed
33
6
0
28 Nov 2023
Cross Entropy in Deep Learning of Classifiers Is Unnecessary -- ISBE
  Error is All You Need
Cross Entropy in Deep Learning of Classifiers Is Unnecessary -- ISBE Error is All You Need
W. Skarbek
10
1
0
27 Nov 2023
Phonetic-aware speaker embedding for far-field speaker verification
Phonetic-aware speaker embedding for far-field speaker verification
Zezhong Jin
Youzhi Tu
Man-Wai Mak
23
1
0
27 Nov 2023
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset
Zhixi Cai
Shreya Ghosh
Aman Pankaj Adatia
Munawar Hayat
Abhinav Dhall
Kalin Stefanov
21
27
0
26 Nov 2023
GAIA: Zero-shot Talking Avatar Generation
GAIA: Zero-shot Talking Avatar Generation
Tianyu He
Junliang Guo
Runyi Yu
Yuchi Wang
Jialiang Zhu
...
Chunyu Wang
Han Hu
HsiangTao Wu
Sheng Zhao
Jiang Bian
31
25
0
26 Nov 2023
Do VSR Models Generalize Beyond LRS3?
Do VSR Models Generalize Beyond LRS3?
Y. A. D. Djilali
Sanath Narayan
Eustache Le Bihan
Haithem Boussaid
Ebtesam Almazrouei
Merouane Debbah
35
4
0
23 Nov 2023
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and
  LAnguage in Conversational Environments
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Somil Jain
Pratik Roy Chowdhuri
Prachi Singh
Deepu Vijayasenan
Sriram Ganapathy
30
6
0
21 Nov 2023
HierSpeech++: Bridging the Gap between Semantic and Acoustic
  Representation of Speech by Hierarchical Variational Inference for Zero-shot
  Speech Synthesis
HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis
Sang-Hoon Lee
Haram Choi
Seung-Bin Kim
Seong-Whan Lee
BDL
29
31
0
21 Nov 2023
Talent-Interview: Web-Client Cheating Detection for Online Exams
Talent-Interview: Web-Client Cheating Detection for Online Exams
Mert Ege
Mustafa Ceyhan
17
0
0
17 Nov 2023
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking
  Embedding
CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding
Jianzong Wang
Yimin Deng
Ziqi Liang
Xulong Zhang
Ning Cheng
Jing Xiao
CVBM
18
2
0
15 Nov 2023
Cross-modal Generative Model for Visual-Guided Binaural Stereo
  Generation
Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation
Zhaojian Li
Bin Zhao
Yuan Yuan
25
3
0
13 Nov 2023
CVTHead: One-shot Controllable Head Avatar with Vertex-feature
  Transformer
CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer
Haoyu Ma
Tong Zhang
Shanlin Sun
Xiangyi Yan
Kun Han
Xiaohui Xie
26
5
0
11 Nov 2023
LaughTalk: Expressive 3D Talking Head Generation with Laughter
LaughTalk: Expressive 3D Talking Head Generation with Laughter
Kim Sung-Bin
Lee Hyun
Da Hye Hong
Suekyeong Nam
Janghoon Ju
Tae-Hyun Oh
28
21
0
02 Nov 2023
Seeing Through the Conversation: Audio-Visual Speech Separation based on
  Diffusion Model
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model
Suyeon Lee
Chaeyoung Jung
Youngjoon Jang
Jaehun Kim
Joon Son Chung
33
7
0
30 Oct 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning,
  and audio processing components for PyTorch
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch
Jeff Hwang
Moto Hira
Caroline Chen
Xiaohui Zhang
Zhaoheng Ni
...
Yumeng Tao
Robin Scheibler
Samuele Cornell
Sean Kim
Stavros Petridis
46
22
0
27 Oct 2023
Learning Repeatable Speech Embeddings Using An Intra-class Correlation
  Regularizer
Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer
Jianwei Zhang
Suren Jayasuriya
Visar Berisha
SSL
25
2
0
25 Oct 2023
Intuitive Multilingual Audio-Visual Speech Recognition with a
  Single-Trained Model
Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model
Joanna Hong
Se Jin Park
Y. Ro
VLM
11
6
0
23 Oct 2023
Diffusion-Based Adversarial Purification for Speaker Verification
Diffusion-Based Adversarial Purification for Speaker Verification
Yibo Bai
Xiao-Lei Zhang
Xuelong Li
DiffM
36
2
0
22 Oct 2023
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting
  Multiple Experts for Video Deepfake Detection
AVTENet: Audio-Visual Transformer-based Ensemble Network Exploiting Multiple Experts for Video Deepfake Detection
Ammarah Hashmi
Sahibzada Adil Shahzad
Chia-Wen Lin
Yu Tsao
Hsin-Min Wang
ViT
53
6
0
19 Oct 2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's
  DASR System
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System
T. Park
He Huang
Ante Jukić
Kunal Dhawan
Krishna C. Puvvada
Nithin Rao Koluguri
Nikolay Karpov
A. Laptev
Jagadeesh Balam
Boris Ginsburg
29
6
0
18 Oct 2023
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification
Yuanyuan Wang
Yang Zhang
Zhiyong Wu
Zhihan Yang
Tao Wei
Kun Zou
Helen M. Meng
17
1
0
18 Oct 2023
Pairwise Similarity Learning is SimPLE
Pairwise Similarity Learning is SimPLE
Yandong Wen
Weiyang Liu
Yao Feng
Bhiksha Raj
Rita Singh
Adrian Weller
Michael J. Black
Bernhard Schölkopf
33
6
0
13 Oct 2023
Cost-Driven Hardware-Software Co-Optimization of Machine Learning
  Pipelines
Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines
Ravit Sharma
W. Romaszkan
Feiqian Zhu
Puneet Gupta
Ankur Mehta
27
0
0
11 Oct 2023
AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation
AdaMesh: Personalized Facial Expressions and Head Poses for Adaptive Speech-Driven 3D Facial Animation
Liyang Chen
Weihong Bao
Shunwei Lei
Boshi Tang
Zhiyong Wu
Shiyin Kang
Haozhi Huang
Helen M. Meng
42
1
0
11 Oct 2023
An Initial Investigation of Neural Replay Simulator for Over-the-Air
  Adversarial Perturbations to Automatic Speaker Verification
An Initial Investigation of Neural Replay Simulator for Over-the-Air Adversarial Perturbations to Automatic Speaker Verification
Jiaqi Li
Li Wang
Liumeng Xue
Lei Wang
Zhizheng Wu
AAML
25
3
0
09 Oct 2023
Multi-objective Progressive Clustering for Semi-supervised Domain
  Adaptation in Speaker Verification
Multi-objective Progressive Clustering for Semi-supervised Domain Adaptation in Speaker Verification
Ze Li
Yuke Lin
Ning Jiang
Xiaoyi Qin
Guoqing Zhao
Haiying Wu
Ming Li
VLM
41
1
0
07 Oct 2023
Integrating Audio-Visual Features for Multimodal Deepfake Detection
Integrating Audio-Visual Features for Multimodal Deepfake Detection
Sneha Muppalla
Shan Jia
Siwei Lyu
25
19
0
05 Oct 2023
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized
  Optimization
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Youwang Kim
Lee Hyun
Kim Sung-Bin
Suekyeong Nam
Janghoon Ju
Tae-Hyun Oh
CVBM
3DH
24
3
0
04 Oct 2023
MIS-AVoiDD: Modality Invariant and Specific Representation for
  Audio-Visual Deepfake Detection
MIS-AVoiDD: Modality Invariant and Specific Representation for Audio-Visual Deepfake Detection
Vinaya Sree Katamneni
A. Rattani
30
12
0
03 Oct 2023
Disentangling Voice and Content with Self-Supervision for Speaker
  Recognition
Disentangling Voice and Content with Self-Supervision for Speaker Recognition
Tianchi Liu
Kong Aik Lee
Qiongqiong Wang
Haizhou Li
BDL
DRL
32
30
0
02 Oct 2023
How Close are Other Computer Vision Tasks to Deepfake Detection?
How Close are Other Computer Vision Tasks to Deepfake Detection?
H. Nguyen
Junichi Yamagishi
Isao Echizen
CVBM
19
2
0
02 Oct 2023
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Andrew Rouditchenko
R. Collobert
Tatiana Likhomanenko
VLM
27
3
0
29 Sep 2023
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual
  Speech Separation
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation
Samuel Pegg
Kai Li
Xiaolin Hu
26
4
0
29 Sep 2023
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous
  Head Motions
OSM-Net: One-to-Many One-shot Talking Head Generation with Spontaneous Head Motions
Jin Liu
Xi Wang
Xiaomeng Fu
Yesheng Chai
Cai Yu
Jiao Dai
Jizhong Han
21
3
0
28 Sep 2023
Rethinking Session Variability: Leveraging Session Embeddings for
  Session Robustness in Speaker Verification
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification
Hee-Soo Heo
Ki-hyun Nam
Bong-Jin Lee
Youngki Kwon
Min-Ji Lee
You Jin Kim
Joon Son Chung
26
1
0
26 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
28
66
0
25 Sep 2023
Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification
Haha-Pod: An Attempt for Laughter-based Non-Verbal Speaker Verification
Yuke Lin
Xiaoyi Qin
Ning Jiang
Guoqing Zhao
Ming Li
36
3
0
25 Sep 2023
Efficient Black-Box Speaker Verification Model Adaptation with
  Reprogramming and Backend Learning
Efficient Black-Box Speaker Verification Model Adaptation with Reprogramming and Backend Learning
Jingyu Li
Tan Lee
AAML
27
1
0
24 Sep 2023
Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Profile-Error-Tolerant Target-Speaker Voice Activity Detection
Dongmei Wang
Xiong Xiao
Naoyuki Kanda
Midia Yousefi
Takuya Yoshioka
Jian Wu
21
3
0
21 Sep 2023
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in
  Speaker Recognition
Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition
Shuai Wang
Qibing Bai
Qi Liu
Jianwei Yu
Zhengyang Chen
Bing Han
Yan-min Qian
Haizhou Li
19
1
0
21 Sep 2023
Test-Time Training for Speech
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
39
1
0
19 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
29
12
0
19 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLM
SSL
55
14
0
19 Sep 2023
Visual Speech Recognition for Languages with Limited Labeled Data using
  Automatic Labels from Whisper
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper
Jeong Hun Yeo
Minsu Kim
Shinji Watanabe
Y. Ro
VLM
26
12
0
15 Sep 2023
Previous
123...567...141516
Next