ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1706.08612
  4. Cited By
VoxCeleb: a large-scale speaker identification dataset
v1v2 (latest)

VoxCeleb: a large-scale speaker identification dataset

26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "VoxCeleb: a large-scale speaker identification dataset"

50 / 1,111 papers shown
Title
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model
Siavash Shams
Sukru Samet Dindar
Xilin Jiang
N. Mesgarani
Mamba
124
22
0
20 May 2024
Generative Artificial Intelligence: A Systematic Review and Applications
Generative Artificial Intelligence: A Systematic Review and Applications
S. S. Sengar
Affan Bin Hasan
Sanjay Kumar
Fiona Carroll
MedIm
76
74
0
17 May 2024
Speaker Embeddings With Weakly Supervised Voice Activity Detection For
  Efficient Speaker Diarization
Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization
Jenthe Thienpondt
Kris Demuynck
74
3
0
15 May 2024
Investigating Design Choices in Joint-Embedding Predictive Architectures
  for General Audio Representation Learning
Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning
Alain Riou
Stefan Lattner
Gaëtan Hadjeres
Geoffroy Peeters
69
2
0
14 May 2024
Open Implementation and Study of BEST-RQ for Speech Processing
Open Implementation and Study of BEST-RQ for Speech Processing
Ryan Whetten
Titouan Parcollet
Marco Dinarelli
Yannick Esteve
99
7
0
07 May 2024
Speaker Characterization by means of Attention Pooling
Speaker Characterization by means of Attention Pooling
Federico Costa
Miquel India
Javier Hernando
62
1
0
07 May 2024
AniTalker: Animate Vivid and Diverse Talking Faces through
  Identity-Decoupled Facial Motion Encoding
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding
Tao Liu
Feilong Chen
Shuai Fan
Chenpeng Du
Qi Chen
Xie Chen
Kai Yu
DiffMPINN
71
31
0
06 May 2024
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic
  Detection under Rebalanced Deepfake Detection Protocol
In Anticipation of Perfect Deepfake: Identity-anchored Artifact-agnostic Detection under Rebalanced Deepfake Detection Protocol
Wei-Han Wang
Chin-Yuan Yeh
Hsi-Wen Chen
De-Nian Yang
Ming-Syan Chen
82
0
0
01 May 2024
Certification of Speaker Recognition Models to Additive Perturbations
Certification of Speaker Recognition Models to Additive Perturbations
Dmitrii Korzh
Elvir Karimov
Mikhail Aleksandrovich Pautov
Oleg Y. Rogov
Ivan Oseledets
85
3
0
29 Apr 2024
Towards Dog Bark Decoding: Leveraging Human Speech Processing for
  Automated Bark Classification
Towards Dog Bark Decoding: Leveraging Human Speech Processing for Automated Bark Classification
Artem Abzaliev
Humberto Pérez Espinosa
Rada Mihalcea
VLM
81
1
0
29 Apr 2024
A Comparison of Differential Performance Metrics for the Evaluation of
  Automatic Speaker Verification Fairness
A Comparison of Differential Performance Metrics for the Evaluation of Automatic Speaker Verification Fairness
Oubaïda Chouchane
Christoph Busch
Chiara Galdi
Nicholas W. D. Evans
Massimiliano Todisco
59
2
0
27 Apr 2024
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced
  Speaker Corpora: Usefulness of Speaker Diarization & Identification
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification
Rémi Uro
D. Doukhan
Albert Rilliard
Laëtitia Larcher
Anissa-Claire Adgharouamane
Marie Tahon
Antoine Laurent
108
4
0
26 Apr 2024
3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow
  Estimation
3DFlowRenderer: One-shot Face Re-enactment via Dense 3D Facial Flow Estimation
Siddharth Nijhawan
T. Yashima
Tamaki Kojima
3DH
63
0
0
23 Apr 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for
  efficient audio recognition
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
85
1
0
21 Apr 2024
Listen Then See: Video Alignment with Speaker Attention
Listen Then See: Video Alignment with Speaker Attention
Aviral Agrawal
Carlos Mateo Samudio Lezcano
Iqui Balam Heredia-Marin
P. Sethi
58
2
0
21 Apr 2024
FSRT: Facial Scene Representation Transformer for Face Reenactment from
  Factorized Appearance, Head-pose, and Facial Expression Features
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance, Head-pose, and Facial Expression Features
Andre Rochow
Max Schwarz
Sven Behnke
ViT
100
8
0
15 Apr 2024
Fuse after Align: Improving Face-Voice Association Learning via
  Multimodal Encoder
Fuse after Align: Improving Face-Voice Association Learning via Multimodal Encoder
Chong Peng
Liqiang He
Jane Polak Scowcroft
CVBM
107
0
0
15 Apr 2024
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for
  Face Video Editing
S3Editor: A Sparse Semantic-Disentangled Self-Training Framework for Face Video Editing
Guangzhi Wang
Tianyi Chen
Kamran Ghasedi
HsiangTao Wu
Tianyu Ding
Chris Nuesmeyer
Ilya Zharkov
Mohan Kankanhalli
Luming Liang
75
1
0
11 Apr 2024
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in
  Viewers' Opinion Scores
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
Lucas Goncalves
Prashant Mathur
Chandrashekhar Lavania
Metehan Cekic
Marcello Federico
Kyu J. Han
62
4
0
10 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
105
15
0
09 Apr 2024
HyperTTS: Parameter Efficient Adaptation in Text to Speech using
  Hypernetworks
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks
Yingting Li
Rishabh Bhardwaj
Ambuj Mehrish
Bo Cheng
Soujanya Poria
71
2
0
06 Apr 2024
Neural Radiance Field-based Visual Rendering: A Comprehensive Review
Neural Radiance Field-based Visual Rendering: A Comprehensive Review
Mingyuan Yao
Yukang Huo
Yang Ran
Qingbin Tian
Ruifeng Wang
Haihua Wang
AI4CE
84
9
0
31 Mar 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
104
68
0
31 Mar 2024
Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for
  Communication
Beyond Talking -- Generating Holistic 3D Human Dyadic Motion for Communication
Mingze Sun
Chao Xu
Xinyu Jiang
Yang Liu
Baigui Sun
Ruqi Huang
84
5
0
28 Mar 2024
Deepfake Generation and Detection: A Benchmark and Survey
Deepfake Generation and Detection: A Benchmark and Survey
Gan Pei
Jiangning Zhang
Menghan Hu
Zhenyu Zhang
Chengjie Wang
Yunsheng Wu
Guangtao Zhai
Jian Yang
Chunhua Shen
Dacheng Tao
104
36
0
26 Mar 2024
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with
  Space-sensitive Customization and Semantic Preservation
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
Qilin Wang
Jiangning Zhang
Chengming Xu
Weijian Cao
Ying Tai
Yue Han
Yanhao Ge
Hong Gu
Chengjie Wang
Yanwei Fu
DiffM
69
0
0
26 Mar 2024
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment
Stella Bounareli
Christos Tzelepis
Vasileios Argyriou
Ioannis Patras
Georgios Tzimiropoulos
DiffM
120
7
0
25 Mar 2024
Adaptive Super Resolution For One-Shot Talking-Head Generation
Adaptive Super Resolution For One-Shot Talking-Head Generation
Luchuan Song
Pinxin Liu
Guojun Yin
Chenliang Xu
80
8
0
23 Mar 2024
Privacy-Preserving End-to-End Spoken Language Understanding
Privacy-Preserving End-to-End Spoken Language Understanding
Ying-Gui Wang
Wei Huang
Le Yang
PILM
96
5
0
22 Mar 2024
Assessing the Robustness of Spectral Clustering for Deep Speaker
  Diarization
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
Nikhil Raghav
Md Sahidullah
79
2
0
21 Mar 2024
KunquDB: An Attempt for Speaker Verification in the Chinese Opera
  Scenario
KunquDB: An Attempt for Speaker Verification in the Chinese Opera Scenario
Huali Zhou
Yuke Lin
Dongxi Liu
Ming Li
57
0
0
20 Mar 2024
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications
  using Deep Learning Based Keypoint Prediction
Enhancing Bandwidth Efficiency for Video Motion Transfer Applications using Deep Learning Based Keypoint Prediction
Xue Bai
Tasmiah Haque
S. Mohan
Yuliang Cai
Byungheon Jeong
Adam Halasz
Srinjoy Das
67
1
0
17 Mar 2024
RID-TWIN: An end-to-end pipeline for automatic face de-identification in
  videos
RID-TWIN: An end-to-end pipeline for automatic face de-identification in videos
Anirban Mukherjee
Monjoy Narayan Choudhury
D. Jayagopi
68
1
0
15 Mar 2024
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with
  Unsupervised Audio Mixtures
uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures
Afrina Tabassum
Dung N. Tran
Trung D. Q. Dang
Ismini Lourentzou
K. Koishida
82
0
0
14 Mar 2024
An Efficient End-to-End Approach to Noise Invariant Speech Features via
  Multi-Task Learning
An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning
Heitor R. Guimarães
Arthur Pimentel
Anderson R. Avila
Mehdi Rezagholizadeh
Boxing Chen
Tiago H. Falk
111
1
0
13 Mar 2024
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Cosine Scoring with Uncertainty for Neural Speaker Embedding
Qiongqiong Wang
Kong Aik Lee
65
2
0
11 Mar 2024
Federated Learning Method for Preserving Privacy in Face Recognition
  System
Federated Learning Method for Preserving Privacy in Face Recognition System
Enoch Solomon
Abraham Woubie
FedML
82
6
0
08 Mar 2024
Dynamic Cross Attention for Audio-Visual Person Verification
Dynamic Cross Attention for Audio-Visual Person Verification
R Gnana Praveen
Jahangir Alam
131
1
0
07 Mar 2024
Audio-Visual Person Verification based on Recursive Fusion of Joint
  Cross-Attention
Audio-Visual Person Verification based on Recursive Fusion of Joint Cross-Attention
R Gnana Praveen
Jahangir Alam
75
3
0
07 Mar 2024
From Speech to Data: Unraveling Google's Use of Voice Data for User
  Profiling
From Speech to Data: Unraveling Google's Use of Voice Data for User Profiling
Xinhang Ma
Sirui Chen
39
1
0
03 Mar 2024
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
  Speaker Verification
Efficient Adapter Tuning of Pre-trained Speech Models for Automatic Speaker Verification
Mufan Sang
John H. L. Hansen
101
6
0
01 Mar 2024
Probing the Information Encoded in Neural-based Acoustic Models of
  Automatic Speech Recognition Systems
Probing the Information Encoded in Neural-based Acoustic Models of Automatic Speech Recognition Systems
Quentin Raymondaud
Mickael Rouvier
Richard Dufour
51
2
0
29 Feb 2024
Unraveling Adversarial Examples against Speaker Identification --
  Techniques for Attack Detection and Victim Model Classification
Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification
Sonal Joshi
Thomas Thebaud
Jesús Villalba
Najim Dehak
AAML
55
1
0
29 Feb 2024
ChildAugment: Data Augmentation Methods for Zero-Resource Children's
  Speaker Verification
ChildAugment: Data Augmentation Methods for Zero-Resource Children's Speaker Verification
Vishwanath Pratap Singh
Md. Sahidullah
Tomi Kinnunen
57
6
0
23 Feb 2024
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech
  Technologies
AnnoTheia: A Semi-Automatic Annotation Toolkit for Audio-Visual Speech Technologies
José-M. Acosta-Triana
David Gimeno-Gómez
Carlos David Martínez Hinarejos
VLMVGen
125
2
0
20 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
113
29
0
20 Feb 2024
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
Significance of Chirp MFCC as a Feature in Speech and Audio Applications
S. J. Joysingh
P. Vijayalakshmi
T. Nagarajan
28
6
0
19 Feb 2024
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
One2Avatar: Generative Implicit Head Avatar For Few-shot User Adaptation
Zhixuan Yu
Ziqian Bai
Abhimitra Meka
Feitong Tan
Qiangeng Xu
Rohit Pandey
S. Fanello
Hyun Soo Park
Yinda Zhang
76
5
0
19 Feb 2024
Cross-Attention Fusion of Visual and Geometric Features for Large
  Vocabulary Arabic Lipreading
Cross-Attention Fusion of Visual and Geometric Features for Large Vocabulary Arabic Lipreading
Samar Daou
Ahmed Rekik
A. Ben-Hamadou
Abdelaziz Kallel
63
3
0
18 Feb 2024
Probing Self-supervised Learning Models with Target Speech Extraction
Probing Self-supervised Learning Models with Target Speech Extraction
Junyi Peng
Marc Delcroix
Tsubasa Ochiai
Oldrich Plchot
Takanori Ashihara
Shoko Araki
J. Černocký
105
4
0
17 Feb 2024
Previous
12345...212223
Next