Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.08612
Cited By
v1
v2 (latest)
VoxCeleb: a large-scale speaker identification dataset
26 June 2017
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"VoxCeleb: a large-scale speaker identification dataset"
50 / 1,111 papers shown
Title
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
86
3
0
17 Oct 2024
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
57
2
0
17 Oct 2024
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Hanbo Cheng
Limin Lin
Chenyu Liu
Pengcheng Xia
Pengfei Hu
Jiefeng Ma
Jun Du
Jia Pan
DiffM
VGen
459
0
0
17 Oct 2024
HeightCeleb - an enrichment of VoxCeleb dataset with speaker height information
Stanisław Kacprzak
K. Kowalczyk
MDE
42
0
0
16 Oct 2024
Generative Human Video Compression with Multi-granularity Temporal Trajectory Factorization
Shanzhi Yin
Bolin Chen
Shiqi Wang
Yan Ye
VGen
DiffM
72
4
0
14 Oct 2024
Beyond GFVC: A Progressive Face Video Compression Framework with Adaptive Visual Tokens
Bolin Chen
Shanzhi Yin
Zihan Zhang
Jie Chen
Ru-Ling Liao
Lingyu Zhu
Shiqi Wang
Yan Ye
55
6
0
11 Oct 2024
SAKA: An Intelligent Platform for Semi-automated Knowledge Graph Construction and Application
Hanrong Zhang
Xiang Wang
Jiabao Pan
Hongwei Wang
284
7
0
10 Oct 2024
Mamba-based Segmentation Model for Speaker Diarization
Alexis Plaquet
Naohiro Tawara
Marc Delcroix
Shota Horiguchi
Atsushi Ando
Shoko Araki
Mamba
73
4
0
09 Oct 2024
FINALLY: fast and universal speech enhancement with studio-like quality
Nicholas Babaev
Kirill Tamogashev
Azat Saginbaev
Ivan Shchekotov
Hanbin Bae
Hosang Sung
WonJun Lee
Hoon-Young Cho
Pavel Andreev
121
5
0
08 Oct 2024
Improving Speaker Representations Using Contrastive Losses on Multi-scale Features
Satvik Dixit
Massa Baali
Rita Singh
Bhiksha Raj
115
0
0
07 Oct 2024
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
203
25
0
01 Oct 2024
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
96
2
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
130
1
0
25 Sep 2024
Disentangling Age and Identity with a Mutual Information Minimization Approach for Cross-Age Speaker Verification
Fengrun Zhang
Wangjin Zhou
Yiming Liu
Wang Geng
Yahui Shan
Chen Zhang
65
0
0
24 Sep 2024
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction
Shuai Wang
Ke Zhang
Shaoxiong Lin
Junjie Li
Xuefei Wang
Meng Ge
Jianwei Yu
Yanmin Qian
Haizhou Li
75
10
0
24 Sep 2024
MIMAFace: Face Animation via Motion-Identity Modulated Appearance Feature Learning
Yue Han
Junwei Zhu
Yuxiang Feng
Xiaozhong Ji
Keke He
Xiangtai Li
Zhucun Xue
Yong Liu
85
0
0
23 Sep 2024
Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kaiwei Chang
Jiawei Du
...
Yi-Chiao Wu
Xu Tan
James Glass
Shinji Watanabe
Hung-yi Lee
87
8
0
21 Sep 2024
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
Feng Qiu
Wei Zhang
Chen Liu
Rudong An
Lincheng Li
Yu Ding
Changjie Fan
Zhipeng Hu
Xin Yu
SLR
3DH
86
0
0
20 Sep 2024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
Jee-weon Jung
Yihan Wu
Xin Wang
Ji-Hoon Kim
Soumi Maiti
...
Joon Son Chung
Wangyou Zhang
Seyun Um
Shinnosuke Takamichi
Shinji Watanabe
155
4
0
18 Sep 2024
Towards Automatic Assessment of Self-Supervised Speech Models using Rank
Zakaria Aldeneh
Vimal Thilak
Takuya Higuchi
B. Theobald
Tatiana Likhomanenko
SSL
153
1
0
16 Sep 2024
Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT
Ryota Komatsu
Takahiro Shinozaki
SSL
108
1
0
16 Sep 2024
Speaker Contrastive Learning for Source Speaker Tracing
Qing Wang
Hongmei Guo
Jian Kang
Mengjie Du
Jie Li
Xiao-Lei Zhang
Lei Xie
114
0
0
16 Sep 2024
TBDM-Net: Bidirectional Dense Networks with Gender Information for Speech Emotion Recognition
Vlad Striletchi
Cosmin Striletchi
Adriana Stan
68
1
0
16 Sep 2024
Self-Tuning Spectral Clustering for Speaker Diarization
Nikhil Raghav
Avisek Gupta
Md Sahidullah
Swagatam Das
129
0
0
16 Sep 2024
ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration
Masao Someki
Kwanghee Choi
Siddhant Arora
William Chen
Samuele Cornell
Jionghao Han
Yifan Peng
Jiatong Shi
Vaibhav Srivastav
Shinji Watanabe
VLM
103
0
0
14 Sep 2024
Integrated Multi-Level Knowledge Distillation for Enhanced Speaker Verification
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Xugang Lu
Lei Li
57
0
0
14 Sep 2024
LawDNet: Enhanced Audio-Driven Lip Synthesis via Local Affine Warping Deformation
Deng Junli
Luo Yihao
Yang Xueting
Li Siyou
Wang Wei
Guo Jinyang
Shi Ping
43
0
0
14 Sep 2024
StyleTalk++: A Unified Framework for Controlling the Speaking Styles of Talking Heads
Suzhen Wang
Yifeng Ma
Yu Ding
Zhipeng Hu
Changjie Fan
Tangjie Lv
Zhidong Deng
Xin Yu
108
12
0
14 Sep 2024
Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
Xugang Lu
OT
116
0
0
14 Sep 2024
HLTCOE JHU Submission to the Voice Privacy Challenge 2024
Lin Zhang
Zexin Cai
Ashi Garg
Kevin Duh
Leibny Paola García-Perera
Sanjeev Khudanpur
Nicholas Andrews
Sanjeev Khudanpur
50
4
0
13 Sep 2024
Text-To-Speech Synthesis In The Wild
Jee-weon Jung
Wangyou Zhang
Soumi Maiti
Yihan Wu
Xin Eric Wang
...
Hye-jin Shim
Nicholas W. D. Evans
Joon Son Chung
Shinnosuke Takamichi
Shinji Watanabe
100
2
0
13 Sep 2024
FedHide: Federated Learning by Hiding in the Neighbors
Hyunsin Park
Sungrack Yun
FedML
58
0
0
12 Sep 2024
Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification
Jin Sob Kim
Hyun Joon Park
Wooseok Shin
Sung Won Han
SLR
93
0
0
12 Sep 2024
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Jian Zhang
Weijian Mai
Zhijun Zhang
VGen
62
0
0
11 Sep 2024
Spoofing-Aware Speaker Verification Robust Against Domain and Channel Mismatches
Chang Zeng
Xiaoxiao Miao
Xin Wang
Erica Cooper
Junichi Yamagishi
AAML
75
1
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
105
2
0
09 Sep 2024
Privacy versus Emotion Preservation Trade-offs in Emotion-Preserving Speaker Anonymization
Zexin Cai
Lin Zhang
Ashi Garg
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Nicholas Andrews
Sanjeev Khudanpur
41
3
0
05 Sep 2024
An Analysis of Linear Complexity Attention Substitutes with BEST-RQ
Ryan Whetten
Titouan Parcollet
Adel Moumen
Marco Dinarelli
Yannick Esteve
114
1
0
04 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
93
1
0
04 Sep 2024
Progressive Residual Extraction based Pre-training for Speech Representation Learning
Tianrui Wang
Jin Li
Ziyang Ma
Rui Cao
Xie Chen
...
Meng Ge
Xiaobao Wang
Yuguang Wang
Jianwu Dang
Nyima Tashi
SSL
110
0
0
31 Aug 2024
EmoAttack: Utilizing Emotional Voice Conversion for Speech Backdoor Attacks on Deep Speech Classification Models
Wenhan Yao
Zedong XingXiarun Chen
Jia Liu
yongqiang He
Weiping Wen
AAML
94
1
0
28 Aug 2024
MegActor-
Σ
Σ
Σ
: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang
Huadong Li
Juhao Wu
Minhao Jing
Linze Li
Renhe Ji
Jiajun Liang
Haoqiang Fan
Jin Wang
VGen
DiffM
94
8
0
27 Aug 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
82
5
0
27 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSL
AI4TS
79
1
0
23 Aug 2024
Meta-Learning in Audio and Speech Processing: An End to End Comprehensive Review
Athul Raimon
Shubha Masti
Shyam K Sateesh
Siyani Vengatagiri
Bhaskarjyoti Das
VLM
AI4TS
82
2
0
19 Aug 2024
FD2Talk: Towards Generalized Talking Head Generation with Facial Decoupled Diffusion Model
Ziyu Yao
Xuxin Cheng
Zhiqi Huang
DiffM
80
4
0
18 Aug 2024
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
Xin Wang
Héctor Delgado
Hemlata Tak
Jee-weon Jung
Hye-jin Shim
...
Md. Sahidullah
Tomi Kinnunen
Nicholas W. D. Evans
K. Lee
Junichi Yamagishi
AAML
96
50
0
16 Aug 2024
Supervised and Unsupervised Alignments for Spoofing Behavioral Biometrics
Thomas Thebaud
Gaël Le Lan
Anthony Larcher
AAML
64
0
0
14 Aug 2024
Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach
Muhammad Saad Saeed
Shah Nawaz
Muhammad Zaigham Zaheer
Muhammad Haris Khan
Karthik Nandakumar
Muhammad Haroon Yousaf
Hassan Sajjad
Tom De Schepper
Markus Schedl
93
0
0
14 Aug 2024
High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model
Weizhi Zhong
Junfan Lin
Peixin Chen
Liang Lin
Guanbin Li
69
1
0
10 Aug 2024
Previous
1
2
3
4
5
...
21
22
23
Next