ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05622
  4. Cited By
VoxCeleb2: Deep Speaker Recognition

VoxCeleb2: Deep Speaker Recognition

14 June 2018
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb2: Deep Speaker Recognition"

50 / 759 papers shown
Title
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
DiffM
43
7
0
25 Mar 2024
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover
  Strategy
Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy
Wenxuan Wu
Xueyuan Chen
Xixin Wu
Haizhou Li
Helen M. Meng
34
1
0
24 Mar 2024
Adaptive Super Resolution For One-Shot Talking-Head Generation
Adaptive Super Resolution For One-Shot Talking-Head Generation
Luchuan Song
Pinxin Liu
Guojun Yin
Chenliang Xu
30
7
0
23 Mar 2024
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and
  Dynamic Weighting Strategies
AVT2-DWF: Improving Deepfake Detection with Audio-Visual Fusion and Dynamic Weighting Strategies
Rui Wang
Dengpan Ye
Long Tang
Yunming Zhang
Jiacheng Deng
ViT
27
9
0
22 Mar 2024
Assessing the Robustness of Spectral Clustering for Deep Speaker
  Diarization
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
Nikhil Raghav
Md Sahidullah
28
2
0
21 Mar 2024
Recursive Joint Cross-Modal Attention for Multimodal Fusion in
  Dimensional Emotion Recognition
Recursive Joint Cross-Modal Attention for Multimodal Fusion in Dimensional Emotion Recognition
R Gnana Praveen
Jahangir Alam
39
17
0
20 Mar 2024
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng
Duomin Wang
Baoyuan Wang
48
21
0
20 Mar 2024
Towards the Development of a Real-Time Deepfake Audio Detection System
  in Communication Platforms
Towards the Development of a Real-Time Deepfake Audio Detection System in Communication Platforms
J. J. Mathew
Rakin Ahsan
Sae Furukawa
Jagdish Gautham Krishna Kumar
Huzaifa Pallan
Agamjeet Singh Padda
Sara Adamski
Madhu Reddiboina
Arjun Pankajakshan
23
2
0
18 Mar 2024
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast
  Conformer
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer
Maxime Burchi
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
Radu Timofte
42
8
0
14 Mar 2024
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Qiongqiong Wang
Kong Aik Lee
30
1
0
11 Mar 2024
Video-Driven Animation of Neural Head Avatars
Video-Driven Animation of Neural Head Avatars
Wolfgang Paier
Paul Hinzer
A. Hilsmann
Peter Eisert
3DH
34
0
0
07 Mar 2024
Multi-Level Attention Aggregation for Language-Agnostic Speaker
  Replication
Multi-Level Attention Aggregation for Language-Agnostic Speaker Replication
Yejin Jeon
Gary Geunbae Lee
26
2
0
06 Mar 2024
Feel the Bite: Robot-Assisted Inside-Mouth Bite Transfer using Robust
  Mouth Perception and Physical Interaction-Aware Control
Feel the Bite: Robot-Assisted Inside-Mouth Bite Transfer using Robust Mouth Perception and Physical Interaction-Aware Control
Rajat Kumar Jenamani
Daniel Stabile
Ziang Liu
Abrar Anwar
Katherine Dimitropoulou
T. Bhattacharjee
34
20
0
06 Mar 2024
Contrastive Learning of Person-independent Representations for Facial
  Action Unit Detection
Contrastive Learning of Person-independent Representations for Facial Action Unit Detection
Yong Li
Shiguang Shan
42
9
0
06 Mar 2024
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces
  from Disentangled Audio
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu
Yang Liu
Jiazheng Xing
Weida Wang
Mingze Sun
...
Tianxin Huang
Siyuan Li
Zhi-Qi Cheng
Ying Tai
Baigui Sun
CVBM
54
11
0
04 Mar 2024
From Speech to Data: Unraveling Google's Use of Voice Data for User
  Profiling
From Speech to Data: Unraveling Google's Use of Voice Data for User Profiling
Xinhang Ma
Sirui Chen
25
0
0
03 Mar 2024
Probing the Information Encoded in Neural-based Acoustic Models of
  Automatic Speech Recognition Systems
Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems
Quentin Raymondaud
Mickael Rouvier
Richard Dufour
25
1
0
29 Feb 2024
Unraveling Adversarial Examples against Speaker Identification --
  Techniques for Attack Detection and Victim Model Classification
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Sonal Joshi
Thomas Thebaud
Jesús Villalba
Najim Dehak
AAML
27
1
0
29 Feb 2024
ChildAugment: Data Augmentation Methods for Zero-Resource Children's
  Speaker Verification
ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
22
2
0
23 Feb 2024
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and
  Context-Aware Visual Speech Processing
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing
Jeong Hun Yeo
Seunghee Han
Minsu Kim
Y. Ro
53
22
0
23 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
52
18
0
20 Feb 2024
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
Zhixuan Yu
Ziqian Bai
Abhimitra Meka
Feitong Tan
Qiangeng Xu
Rohit Pandey
S. Fanello
Hyun Soo Park
Yinda Zhang
28
4
0
19 Feb 2024
Cross-Attention Fusion of Visual and Geometric Features for Large
  Vocabulary Arabic Lipreading
Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou
Ahmed Rekik
A. Ben-Hamadou
Abdelaziz Kallel
31
3
0
18 Feb 2024
LightCAM: A Fast and Light Implementation of Context-Aware Masking based
  D-TDNN for Speaker Verification
LightCAM: A Fast and Light Implementation of Context-Aware Masking based D-TDNN for Speaker Verification
Di Cao
Xianchen Wang
Junfeng Zhou
Jiakai Zhang
Yanjing Lei
Wenpeng Chen
19
0
0
08 Feb 2024
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent
  Space
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
CVBM
3DH
45
8
0
05 Feb 2024
Adversarial Data Augmentation for Robust Speaker Verification
Adversarial Data Augmentation for Robust Speaker Verification
Zhenyu Zhou
Junhui Chen
Namin Wang
Lantian Li
Dong Wang
14
2
0
05 Feb 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
79
21
0
30 Jan 2024
Diffusion Facial Forgery Detection
Diffusion Facial Forgery Detection
Harry Cheng
Yangyang Guo
Tianyi Wang
L. Nie
Mohan S. Kankanhalli
61
16
0
29 Jan 2024
Adversarial speech for voice privacy protection from Personalized Speech
  generation
Adversarial speech for voice privacy protection from Personalized Speech generation
Shihao Chen
Liping Chen
Jie Zhang
KongAik Lee
Zhenhua Ling
Lirong Dai
AAML
13
1
0
22 Jan 2024
Detecting Multimedia Generated by Large AI Models: A Survey
Detecting Multimedia Generated by Large AI Models: A Survey
Li Lin
Neeraj Gupta
Yue Zhang
Hainan Ren
Chun-Hao Liu
Feng Ding
Xin Wang
Xin Li
Luisa Verdoliva
Shu Hu
86
57
0
22 Jan 2024
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive
  Learning Strategy for Speech Emotion Recognition
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition
Ismail Rasim Ulgen
Zongyang Du
Carlos Busso
Berrak Sisman
21
2
0
19 Jan 2024
Efficient Training for Multilingual Visual Speech Recognition:
  Pre-training with Discretized Visual Speech Representation
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation
Minsu Kim
Jeong Hun Yeo
Se Jin Park
J. Choi
Y. Ro
27
5
0
18 Jan 2024
Tri$^{2}$-plane: Thinking Head Avatar via Feature Pyramid
Tri2^{2}2-plane: Thinking Head Avatar via Feature Pyramid
Luchuan Song
Pinxin Liu
Lele Chen
Guojun Yin
Chenliang Xu
3DH
26
6
0
17 Jan 2024
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Zhenhui Ye
Tianyun Zhong
Yi Ren
Jiaqi Yang
Weichuang Li
...
Jinglin Liu
Chen Zhang
Xiang Yin
Zejun Ma
Zhou Zhao
29
45
0
16 Jan 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised
  Audio-Visual Emotion Recognition
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
Guoying Zhao
Zheng Lian
Bin Liu
Jianhua Tao
53
29
0
11 Jan 2024
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion
  Recognition
MERBench: A Unified Evaluation Benchmark for Multimodal Emotion Recognition
Zheng Lian
Guoying Zhao
Yong Ren
Hao Gu
Haiyang Sun
Lan Chen
Bin Liu
Jianhua Tao
21
12
0
07 Jan 2024
Gradient weighting for speaker verification in extremely low
  Signal-to-Noise Ratio
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
Yi Ma
Kong Aik Lee
Ville Hautamaki
Meng Ge
Haizhou Li
31
0
0
05 Jan 2024
Self-supervised Reflective Learning through Self-distillation and Online
  Clustering for Speaker Representation Learning
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
Danwei Cai
Zexin Cai
Ming Li
25
0
0
03 Jan 2024
SVFAP: Self-supervised Video Facial Affect Perceiver
SVFAP: Self-supervised Video Facial Affect Perceiver
Guoying Zhao
Zheng Lian
Kexin Wang
Yu He
Ming Xu
Haiyang Sun
Bin Liu
Jianhua Tao
56
14
0
31 Dec 2023
Jeffreys divergence-based regularization of neural network output
  distribution applied to speaker recognition
Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition
Pierre-Michel Bousquet
Mickael Rouvier
UQCV
14
2
0
28 Dec 2023
HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs
HeadCraft: Modeling High-Detail Shape Variations for Animated 3DMMs
Artem Sevastopolsky
Philip-William Grassal
Simon Giebenhain
ShahRukh Athar
Luisa Verdoliva
Matthias Niessner
3DH
58
4
0
21 Dec 2023
Leveraging Visual Supervision for Array-based Active Speaker Detection
  and Localization
Leveraging Visual Supervision for Array-based Active Speaker Detection and Localization
Davide Berghi
Philip J. B. Jackson
48
5
0
21 Dec 2023
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for
  Single Image Talking Face Generation
DREAM-Talk: Diffusion-based Realistic Emotional Audio-driven Method for Single Image Talking Face Generation
Chenxu Zhang
Chao Wang
Jianfeng Zhang
Hongyi Xu
Guoxian Song
You Xie
Linjie Luo
Yapeng Tian
Xiaohu Guo
Jiashi Feng
35
19
0
21 Dec 2023
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from
  their voices
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
Beltrán Labrador
Manuel Otero-Gonzalez
Alicia Lozano-Diez
D. Ramos-Castro
Doroteo T. Toledano
Joaquín González-Rodríguez
16
0
0
20 Dec 2023
Learning Dense Correspondence for NeRF-Based Face Reenactment
Learning Dense Correspondence for NeRF-Based Face Reenactment
Songlin Yang
Wei Wang
Yushi Lan
Xiangyu Fan
Bo Peng
Lei Yang
Jing Dong
CVBM
3DH
21
6
0
16 Dec 2023
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech
  Representations of Unlabeled Data
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data
Hendrik Laux
Emil Mededovic
Ahmed Hallawa
Lukas Martin
A. Peine
Anke Schmeink
VLM
26
4
0
15 Dec 2023
Audio-visual fine-tuning of audio-only ASR models
Audio-visual fine-tuning of audio-only ASR models
Avner May
Dmitriy Serdyuk
Ankit Parag Shah
Otavio Braga
Olivier Siohan
23
3
0
14 Dec 2023
Scalable Ensemble-based Detection Method against Adversarial Attacks for
  speaker verification
Scalable Ensemble-based Detection Method against Adversarial Attacks for speaker verification
Haibin Wu
Heng-Cheng Kuo
Yu Tsao
Hung-yi Lee
AAML
26
1
0
14 Dec 2023
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for
  Speaker Verification
NeXt-TDNN: Modernizing Multi-Scale Temporal Convolution Backbone for Speaker Verification
Hyunjun Heo
U.H Shin
Ran Lee
YoungJu Cheon
Hyung-Min Park
26
9
0
14 Dec 2023
GMTalker: Gaussian Mixture-based Audio-Driven Emotional Talking Video Portraits
GMTalker: Gaussian Mixture-based Audio-Driven Emotional Talking Video Portraits
Yibo Xia
Lizhen Wang
Xiang Deng
Xiaoyan Luo
Yunhong Wang
Yebin Liu
VGen
45
1
0
12 Dec 2023
Previous
123456...141516
Next