Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.01255
Cited By
pyannote.audio: neural building blocks for speaker diarization
4 November 2019
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
Re-assign community
ArXiv
PDF
HTML
Papers citing
"pyannote.audio: neural building blocks for speaker diarization"
44 / 144 papers shown
Title
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task
Ziqiang Zhang
Junyi Ao
Long Zhou
Shujie Liu
Furu Wei
Jinyu Li
22
9
0
12 Jun 2022
Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study
Sneha Das
N. Lønfeldt
A. Pagsberg
Line H. Clemmensen
16
3
0
25 Apr 2022
Generative Spoken Dialogue Language Modeling
Tu Nguyen
Eugene Kharitonov
Jade Copet
Yossi Adi
Wei-Ning Hsu
...
Paden Tomasello
Robin Algayres
Benoît Sagot
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
38
80
0
30 Mar 2022
Multi-scale Speaker Diarization with Dynamic Scale Weighting
Tae Jin Park
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
21
19
0
30 Mar 2022
Audio visual character profiles for detecting background characters in entertainment media
Rahul Sharma
Shrikanth Narayanan
17
5
0
21 Mar 2022
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
Dawei Liang
Zifan Xu
Yinuo Chen
Rebecca Adaimi
David Harwath
Edison Thomaz
48
1
0
21 Mar 2022
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation
Yichao Yan
Zanwei Zhou
Zi Wang
Chen-Ning Yang
Xiaokang Yang
CVBM
21
19
0
15 Mar 2022
Magnitude-aware Probabilistic Speaker Embeddings
Nikita Kuzmin
Igor Fedorov
A. Sholokhov
27
7
0
28 Feb 2022
The xmuspeech system for multi-channel multi-party meeting transcription challenge
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Lin Li
Q. Hong
Feng Tong
16
0
0
11 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge
A. Brown
Jaesung Huh
Joon Son Chung
Arsha Nagrani
Daniel Garcia-Romero
Andrew Zisserman
31
40
0
12 Jan 2022
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem
Jing Shi
Xuankai Chang
Tomoki Hayashi
Yen-Ju Lu
Shinji Watanabe
Bo Xu
32
19
0
17 Dec 2021
The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage
Daniel Galvez
G. Diamos
Juan Ciro
Juan Felipe Cerón
Keith Achorn
Anjali Gopi
David Kanter
Maximilian Lam
Mark Mazumder
Vijay Janapa Reddi
22
95
0
17 Nov 2021
LiMuSE: Lightweight Multi-modal Speaker Extraction
Qinghua Liu
Yating Huang
Yunzhe Hao
Jiaming Xu
Bo Xu
43
6
0
07 Nov 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Nithin Rao Koluguri
Taejin Park
Boris Ginsburg
ViT
33
94
0
08 Oct 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates
Hirofumi Inaguma
Siddharth Dalmia
Brian Yan
Shinji Watanabe
65
11
0
27 Sep 2021
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring
Hirofumi Inaguma
Yosuke Higuchi
Kevin Duh
Tatsuya Kawahara
Shinji Watanabe
63
11
0
09 Sep 2021
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
Jie Wang
Fuchuan Tong
Zhi-Cong Chen
Lin Li
Q. Hong
Haodong Zhou
34
1
0
06 Sep 2021
The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021
Keke Wang
Xudong Mao
Hao Wu
Chen Ding
Chuxiang Shang
Rui Xia
Yuxuan Wang
20
13
0
05 Sep 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System
Hirofumi Inaguma
Shun Kiyono
Nelson Enrique Yalta Soplin
Pengcheng Guo
Jun Suzuki
Kevin Duh
Shinji Watanabe
3DV
37
2
0
01 Jul 2021
SpeechBrain: A General-Purpose Speech Toolkit
Mirco Ravanelli
Titouan Parcollet
Peter William VanHarn Plantinga
Aku Rouhe
Samuele Cornell
...
William Aris
Hwidong Na
Yan Gao
R. Mori
Yoshua Bengio
24
752
0
08 Jun 2021
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
Soumi Maiti
Hakan Erdogan
K. Wilson
Scott Wisdom
Shinji Watanabe
J. Hershey
27
21
0
05 May 2021
End-to-End Speech Recognition from Federated Acoustic Models
Yan Gao
Titouan Parcollet
Salah Zaiem
Javier Fernandez-Marques
Pedro Porto Buarque de Gusmão
Daniel J. Beutel
Nicholas D. Lane
28
43
0
29 Apr 2021
End-to-end speaker segmentation for overlap-aware resegmentation
H. Bredin
Antoine Laurent
VLM
209
163
0
08 Apr 2021
An Initial Investigation for Detecting Partially Spoofed Audio
Lin Zhang
Xin Wang
Erica Cooper
Junichi Yamagishi
J. Patino
Nicholas W. D. Evans
15
45
0
06 Apr 2021
Learning spectro-temporal representations of complex sounds with parameterized neural networks
Rachid Riad
Julien Karadayi
Anne-Catherine Bachoud-Lévi
Emmanuel Dupoux
29
7
0
12 Mar 2021
Incorporating VAD into ASR System by Multi-task Learning
Meng Li
Xiai Yan
Feng Lin
VLM
6
3
0
02 Mar 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
Shota Horiguchi
Nelson Yalta
Leibny Paola García-Perera
Yuki Takashima
Yawen Xue
Desh Raj
Zili Huang
Yusuke Fujita
Shinji Watanabe
Sanjeev Khudanpur
BDL
27
36
0
02 Feb 2021
Speech Enhancement for Wake-Up-Word detection in Voice Assistants
David Bonet
Guillermo Cámbara
Fernando López
Pablo Gómez
Carlos Segura
Jordi Luque
27
11
0
29 Jan 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
SSL
25
462
0
02 Jan 2021
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks
Federico Landini
Jan Profant
Mireia Díez
L. Burget
216
199
0
29 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani
Joon Son Chung
Jaesung Huh
Andrew Brown
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
21
74
0
12 Dec 2020
Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews
Rachid Riad
Hadrien Titeux
Laurie Lemoine
Justine Montillot
A. Sliwinski
J. Bagnou
Xuan-Nga Cao
Anne-Catherine Bachoud-Lévi
Emmanuel Dupoux
15
0
0
30 Oct 2020
Speech Activity Detection Based on Multilingual Speech Recognition System
Seyyed Saeed Sarfjoo
S. Madikeri
P. Motlícek
39
7
0
23 Oct 2020
Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers
Zeqian Li
Jacob Whitehill
17
11
0
22 Oct 2020
Analysis of the BUT Diarization System for VoxConverse Challenge
Federico Landini
O. Glembek
P. Matejka
Johan Rohdin
L. Burget
Mireia Díez
Anna Silnova
16
32
0
22 Oct 2020
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
25
117
0
02 Jul 2020
A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification
Juan Manuel Coria
H. Bredin
Sahar Ghannay
S. Rosset
23
15
0
31 Mar 2020
Cross modal video representations for weakly supervised active speaker localization
Rahul Sharma
Krishna Somandepalli
Shrikanth Narayanan
19
8
0
09 Mar 2020
Seshat: A tool for managing and verifying annotation campaigns of audio data
Hadrien Titeux
Rachid Riad
Xuan-Nga Cao
Nicolas Hamilakis
Kris Madden
Alejandrina Cristià
Anne-Catherine Bachoud-Lévi
Emmanuel Dupoux
VLM
8
7
0
03 Mar 2020
Speaker detection in the wild: Lessons learned from JSALT 2019
Leibny Paola García-Perera
Jesus Villalba
H. Bredin
Jun Du
Diego Castán
...
Wassim Bouaziz
Hadrien Titeux
Emmanuel Dupoux
Kong Aik Lee
Najim Dehak
16
29
0
02 Dec 2019
The Speed Submission to DIHARD II: Contributions & Lessons Learned
Md. Sahidullah
J. Patino
Samuele Cornell
Ruiqing Yin
S. Sivasankaran
...
Emmanuel Vincent
Nicholas W. D. Evans
S´ebastien Marcel
S. Squartini
C. Barras
VLM
14
16
0
06 Nov 2019
Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection
Latané Bullock
H. Bredin
Leibny Paola García-Perera
22
94
0
25 Oct 2019
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
266
2,238
0
14 Jun 2018
Previous
1
2
3