ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXivPDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,098 papers shown
Title
Listen Then See: Video Alignment with Speaker Attention
Listen Then See: Video Alignment with Speaker Attention
Aviral Agrawal
Carlos Mateo Samudio Lezcano
Iqui Balam Heredia-Marin
P. Sethi
35
2
0
21 Apr 2024
FSRT: Facial Scene Representation Transformer for Face Reenactment from
  Factorized Appearance, Head-pose, and Facial Expression Features
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Andre Rochow
Max Schwarz
Sven Behnke
ViT
48
6
0
15 Apr 2024
Fuse after Align: Improving Face-Voice Association Learning via
  Multimodal Encoder
Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder
Chong Peng
Liqiang He
Dan Su
CVBM
39
0
0
15 Apr 2024
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for
  Face Video Editing
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
Guangzhi Wang
Tianyi Chen
Kamran Ghasedi
HsiangTao Wu
Tianyu Ding
Chris Nuesmeyer
Ilya Zharkov
Mohan Kankanhalli
Luming Liang
42
1
0
11 Apr 2024
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in
  Viewers' Opinion Scores
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
Lucas Goncalves
Prashant Mathur
Chandrashekhar Lavania
Metehan Cekic
Marcello Federico
Kyu J. Han
20
4
0
10 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
42
10
0
09 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using
  Hypernetworks
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
46
2
0
06 Apr 2024
Neural Radiance Field-based Visual Rendering: A Comprehensive Review
Neural Radiance Field-based Visual Rendering: A Comprehensive Review
Mingyuan Yao
Yukang Huo
Yang Ran
Qingbin Tian
Ruifeng Wang
Haihua Wang
AI4CE
41
8
0
31 Mar 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
23
45
0
31 Mar 2024
Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for
  Communication
Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication
Mingze Sun
Chao Xu
Xinyu Jiang
Yang Liu
Baigui Sun
Ruqi Huang
49
3
0
28 Mar 2024
Deepfake Generation and Detection: A Benchmark and Survey
Deepfake Generation and Detection: A Benchmark and Survey
Gan Pei
Jiangning Zhang
Menghan Hu
Zhenyu Zhang
Chengjie Wang
Yunsheng Wu
Guangtao Zhai
Jian Yang
Chunhua Shen
Dacheng Tao
52
25
0
26 Mar 2024
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with
  Space-sensitive Customization and Semantic Preservation
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
Qilin Wang
Jiangning Zhang
Chengming Xu
Weijian Cao
Ying Tai
Yue Han
Yanhao Ge
Hong Gu
Chengjie Wang
Yanwei Fu
DiffM
45
0
0
26 Mar 2024
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
DiffM
45
7
0
25 Mar 2024
Adaptive Super Resolution For One-Shot Talking-Head Generation
Adaptive Super Resolution For One-Shot Talking-Head Generation
Luchuan Song
Pinxin Liu
Guojun Yin
Chenliang Xu
32
7
0
23 Mar 2024
Privacy-Preserving End-to-End Spoken Language Understanding
Privacy-Preserving End-to-End Spoken Language Understanding
Ying-Gui Wang
Wei Huang
Le Yang
PILM
43
5
0
22 Mar 2024
Assessing the Robustness of Spectral Clustering for Deep Speaker
  Diarization
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
Nikhil Raghav
Md Sahidullah
28
2
0
21 Mar 2024
KunquDB: An Attempt for Speaker Verification in the Chinese Opera
  Scenario
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Huali Zhou
Yuke Lin
Dongxi Liu
Ming Li
37
0
0
20 Mar 2024
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications
  using Deep Learning Based Keypoint Prediction
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction
Xue Bai
Tasmiah Haque
S. Mohan
Yuliang Cai
Byungheon Jeong
Adam Halasz
Srinjoy Das
31
1
0
17 Mar 2024
RID-TWIN: An end-to-end pipeline for automatic face de-identification in
  videos
RID-TWIN: An end-to-end pipeline for automatic face de-identification in videos
Anirban Mukherjee
Monjoy Narayan Choudhury
D. Jayagopi
37
1
0
15 Mar 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with
  Unsupervised Audio Mixtures
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
50
0
0
14 Mar 2024
An Efficient End-to-End Approach to Noise Invariant Speech Features via
  Multi-Task Learning
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
67
1
0
13 Mar 2024
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Qiongqiong Wang
Kong Aik Lee
38
1
0
11 Mar 2024
Federated Learning Method for Preserving Privacy in Face Recognition
  System
Federated Learning Method for Preserving Privacy in Face Recognition System
Enoch Solomon
Abraham Woubie
FedML
44
3
0
08 Mar 2024
Dynamic Cross Attention for Audio-Visual Person Verification
Dynamic Cross Attention for Audio-Visual Person Verification
R Gnana Praveen
Jahangir Alam
40
1
0
07 Mar 2024
Audio-Visual Person Verification based on Recursive Fusion of Joint
  Cross-Attention
Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
49
2
0
07 Mar 2024
From Speech to Data: Unraveling Google's Use of Voice Data for User
  Profiling
From Speech to Data: Unraveling Google's Use of Voice Data for User Profiling
Xinhang Ma
Sirui Chen
27
1
0
03 Mar 2024
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
  Speaker Verification
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
Mufan Sang
John H. L. Hansen
49
6
0
01 Mar 2024
Probing the Information Encoded in Neural-based Acoustic Models of
  Automatic Speech Recognition Systems
Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems
Quentin Raymondaud
Mickael Rouvier
Richard Dufour
25
1
0
29 Feb 2024
Unraveling Adversarial Examples against Speaker Identification --
  Techniques for Attack Detection and Victim Model Classification
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Sonal Joshi
Thomas Thebaud
Jesús Villalba
Najim Dehak
AAML
27
1
0
29 Feb 2024
ChildAugment: Data Augmentation Methods for Zero-Resource Children's
  Speaker Verification
ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
30
2
0
23 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech
  Technologies
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLM
VGen
47
2
0
20 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
55
19
0
20 Feb 2024
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
S. J. Joysingh
P. Vijayalakshmi
T. Nagarajan
21
5
0
19 Feb 2024
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
Zhixuan Yu
Ziqian Bai
Abhimitra Meka
Feitong Tan
Qiangeng Xu
Rohit Pandey
S. Fanello
Hyun Soo Park
Yinda Zhang
28
4
0
19 Feb 2024
Cross-Attention Fusion of Visual and Geometric Features for Large
  Vocabulary Arabic Lipreading
Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou
Ahmed Rekik
A. Ben-Hamadou
Abdelaziz Kallel
31
3
0
18 Feb 2024
Probing Self-supervised Learning Models with Target Speech Extraction
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Takanori Ashihara
Shoko Araki
J. Černocký
40
2
0
17 Feb 2024
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent
  Space
One-shot Neural Face Reenactment via Finding Directions in GAN's Latent Space
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
CVBM
3DH
45
8
0
05 Feb 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible
  recipes, self-supervised front-ends, and off-the-shelf models
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models
Jee-weon Jung
Wangyou Zhang
Jiatong Shi
Zakaria Aldeneh
Takuya Higuchi
B. Theobald
Ahmed Hussen Abdelaziz
Shinji Watanabe
81
21
0
30 Jan 2024
Generalizing Speaker Verification for Spoof Awareness in the Embedding
  Space
Generalizing Speaker Verification for Spoof Awareness in the Embedding Space
Xuechen Liu
Md. Sahidullah
K. Lee
Tomi Kinnunen
AAML
39
7
0
20 Jan 2024
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive
  Learning Strategy for Speech Emotion Recognition
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition
Ismail Rasim Ulgen
Zongyang Du
Carlos Busso
Berrak Sisman
29
2
0
19 Jan 2024
Continuous Piecewise-Affine Based Motion Model for Image Animation
Continuous Piecewise-Affine Based Motion Model for Image Animation
Hexiang Wang
Fengqi Liu
Qianyu Zhou
Ran Yi
Xin Tan
Lizhuang Ma
VGen
29
9
0
17 Jan 2024
Deep Learning in Physical Layer: Review on Data Driven End-to-End
  Communication Systems and their Enabling Semantic Applications
Deep Learning in Physical Layer: Review on Data Driven End-to-End Communication Systems and their Enabling Semantic Applications
Nazmul Islam
Seokjoo Shin
AI4CE
29
3
0
08 Jan 2024
Self-supervised Reflective Learning through Self-distillation and Online
  Clustering for Speaker Representation Learning
Self-supervised Reflective Learning through Self-distillation and Online Clustering for Speaker Representation Learning
Danwei Cai
Zexin Cai
Ming Li
35
0
0
03 Jan 2024
A Generalist FaceX via Learning Unified Facial Representation
A Generalist FaceX via Learning Unified Facial Representation
Yue Han
Jiangning Zhang
Junwei Zhu
Xiangtai Li
Yanhao Ge
Wei Li
Chengjie Wang
Yong Liu
Xiaoming Liu
Ying Tai
DiffM
32
13
0
31 Dec 2023
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
T. Dao
D. Vu
Cuong Pham
Anh Tran
29
1
0
28 Dec 2023
Jeffreys divergence-based regularization of neural network output
  distribution applied to speaker recognition
Jeffreys divergence-based regularization of neural network output distribution applied to speaker recognition
Pierre-Michel Bousquet
Mickael Rouvier
UQCV
16
2
0
28 Dec 2023
SAIC: Integration of Speech Anonymization and Identity Classification
SAIC: Integration of Speech Anonymization and Identity Classification
Ming Cheng
Xingjian Diao
Shitong Cheng
Wenjun Liu
53
6
0
23 Dec 2023
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from
  their voices
Voxceleb-ESP: preliminary experiments detecting Spanish celebrities from their voices
Beltrán Labrador
Manuel Otero-Gonzalez
Alicia Lozano-Diez
D. Ramos-Castro
Doroteo T. Toledano
Joaquín González-Rodríguez
24
0
0
20 Dec 2023
Learning Dense Correspondence for NeRF-Based Face Reenactment
Learning Dense Correspondence for NeRF-Based Face Reenactment
Songlin Yang
Wei Wang
Yushi Lan
Xiangyu Fan
Bo Peng
Lei Yang
Jing Dong
CVBM
3DH
29
6
0
16 Dec 2023
Efficient speech detection in environmental audio using acoustic
  recognition and knowledge distillation
Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation
Drew Priebe
Burooj Ghani
Dan Stowell
19
5
0
14 Dec 2023
Previous
12345...202122
Next