Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.01255
Cited By
pyannote.audio: neural building blocks for speaker diarization
4 November 2019
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
Re-assign community
ArXiv
PDF
HTML
Papers citing
"pyannote.audio: neural building blocks for speaker diarization"
50 / 144 papers shown
Title
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
32
12
0
19 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding Enhancer
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
28
18
0
13 Sep 2023
Can Language Models Learn to Listen?
Evonne Ng
Sanjay Subramanian
Dan Klein
Angjoo Kanazawa
Trevor Darrell
Shiry Ginosar
35
17
0
21 Aug 2023
Home monitoring for frailty detection through sound and speaker diarization analysis
Yannis Tevissen
D. Istrate
V. Zalc
Jérôme Boudy
Gérard Chollet
Frédéric Petitpont
Sami Boutamine
31
0
0
17 Aug 2023
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
Martin Lebourdais
Théo Mariotte
Marie Tahon
Anthony Larcher
Antoine Laurent
Silvio Montrésor
S. Meignier
Jean-Hugh Thomas
VLM
33
5
0
24 Jul 2023
OxfordVGG Submission to the EGO4D AV Transcription Challenge
Jaesung Huh
Max Bain
Andrew Zisserman
37
0
0
18 Jul 2023
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform
Mateusz Wójcik
Witold Ko'sciukiewicz
Mateusz Baran
Tomasz Kajdanowicz
Adam Gonczarek
CLL
24
1
0
11 Jul 2023
Enrollment-stage Backdoor Attacks on Speaker Recognition Systems via Adversarial Ultrasound
Xinfeng Li
Junning Ze
Chen Yan
Yushi Cheng
Xiaoyu Ji
Wenyuan Xu
AAML
23
11
0
28 Jun 2023
Wespeaker baselines for VoxSRC2023
Shuai Wang
Che-Yuan Liang
Xu Xiang
Bing Han
Zhengyang Chen
Hongji Wang
Wen Ding
32
0
0
27 Jun 2023
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Samuele Cornell
Matthew Wiesner
Shinji Watanabe
Desh Raj
Xuankai Chang
...
Matthew Maciejewski
Yoshiki Masuyama
Zhong-Qiu Wang
S. Squartini
Sanjeev Khudanpur
24
51
0
23 Jun 2023
A Novel Scheme to classify Read and Spontaneous Speech
Sunil Kumar Kopparapu
6
0
0
13 Jun 2023
Zambezi Voice: A Multilingual Speech Corpus for Zambian Languages
Claytone Sikasote
Kalinda Siaminwe
Stanly Mwape
Bangiwe Zulu
Mofya Phiri
Martin Phiri
David Zulu
Mayumbo Nyirenda
Antonios Anastasopoulos
28
6
0
07 Jun 2023
Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features
Théo Mariotte
Anthony Larcher
Silvio Montrésor
Jean-Hugh Thomas
25
1
0
07 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
32
73
0
06 Jun 2023
On the Robustness of Arabic Speech Dialect Identification
Peter Sullivan
AbdelRahim Elmadany
Muhammad Abdul-Mageed
25
8
0
01 Jun 2023
Encoder-decoder multimodal speaker change detection
Jee-weon Jung
Soonshin Seo
Hee-Soo Heo
Geon-min Kim
You Jin Kim
Youngki Kwon
Min-Ji Lee
Bong-Jin Lee
37
2
0
01 Jun 2023
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
L. Serafini
Samuele Cornell
Giovanni Morrone
Enrico Zovato
A. Brutti
S. Squartini
47
9
0
29 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
21
7
0
23 May 2023
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Marc Delcroix
Naohiro Tawara
Mireia Díez
Federico Landini
Anna Silnova
A. Ogawa
Tomohiro Nakatani
L. Burget
S. Araki
40
5
0
23 May 2023
AutoAD: Movie Description in Context
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
21
34
0
29 Mar 2023
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations
Giovanni Morrone
Samuele Cornell
L. Serafini
Enrico Zovato
A. Brutti
S. Squartini
23
4
0
21 Mar 2023
A processing framework to access large quantities of whispered speech found in ASMR
Pablo Pérez Zarazaga
G. Henter
Zofia Malisz
44
1
0
13 Mar 2023
Improving Transformer-based End-to-End Speaker Diarization by Assigning Auxiliary Losses to Attention Heads
Ye-Rin Jeoung
Joon-Young Yang
Jeong-Hwan Choi
Joon‐Hyuk Chang
11
12
0
02 Mar 2023
WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Max Bain
Jaesung Huh
Tengda Han
Andrew Zisserman
36
209
0
01 Mar 2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Jaesung Huh
A. Brown
Jee-weon Jung
Joon Son Chung
Arsha Nagrani
D. Garcia-Romero
Andrew Zisserman
23
26
0
20 Feb 2023
Anchorage: Visual Analysis of Satisfaction in Customer Service Videos via Anchor Events
Kamkwai Wong
Xingbo Wang
Yong Wang
Jianben He
Rongzheng Zhang
Huamin Qu
28
16
0
14 Feb 2023
ASR Bundestag: A Large-Scale political debate dataset in German
Johannes Wirth
René Peinl
32
1
0
12 Feb 2023
Residual Information in Deep Speaker Embedding Architectures
Adriana Stan
34
5
0
06 Feb 2023
The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description
Yannis Tevissen
Jérôme Boudy
Frédéric Petitpont
28
1
0
17 Jan 2023
Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing
William Brannon
Yogesh Virkar
Brian Thompson
42
21
0
23 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
37
8
0
01 Dec 2022
Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire
Zhiyun Fan
Zhenlin Liang
Linhao Dong
Yi Liu
Shiyu Zhou
Meng Cai
Jun Zhang
Zejun Ma
Bo Xu
29
2
0
17 Nov 2022
Exploring Detection-based Method For Speaker Diarization @ Ego4D Audio-only Diarization Challenge 2022
Jiahao Wang
Guo Chen
Yin-Dong Zheng
Tong Lu
17
0
0
16 Nov 2022
Absolute decision corrupts absolutely: conservative online speaker diarisation
Youngki Kwon
Hee-Soo Heo
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
25
2
0
09 Nov 2022
BER: Balanced Error Rate For Speaker Diarization
Tao Liu
K. Yu
20
4
0
08 Nov 2022
High-resolution embedding extractor for speaker diarisation
Hee-Soo Heo
Youngki Kwon
Bong-Jin Lee
You Jin Kim
Jee-weon Jung
32
5
0
08 Nov 2022
No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros
Laura Cabrera-Quiros
Hayley Hung
29
1
0
01 Nov 2022
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Marie Kunesova
Zbynek Zajíc
SSL
VLM
18
15
0
26 Oct 2022
In search of strong embedding extractors for speaker diarisation
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesung Huh
A. Brown
Youngki Kwon
Shinji Watanabe
Joon Son Chung
44
16
0
26 Oct 2022
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Marvin Lavechin
Marianne Métais
Hadrien Titeux
Alodie Boissonnet
Jade Copet
M. Rivière
Elika Bergelson
Alejandrina Cristià
Emmanuel Dupoux
H. Bredin
37
24
0
24 Oct 2022
Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Feng Tong
Lin Li
Q. Hong
23
6
0
24 Sep 2022
Joint Speech Activity and Overlap Detection with Multi-Exit Architecture
Ziqing Du
Kai Liu
Xucheng Wan
Huan Zhou
25
0
0
24 Sep 2022
Unsupervised active speaker detection in media content using cross-modal information
Rahul Sharma
Shrikanth Narayanan
24
3
0
24 Sep 2022
The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Qutang Cai
Guoqiang Hong
Zhijian Ye
Ximin Li
Haizhou Li
38
7
0
23 Sep 2022
The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
R. Zhou
Yu Du
Che-Ming Hu
22
0
0
20 Sep 2022
Overlapped speech and gender detection with WavLM pre-trained features
Martin Lebourdais
Marie Tahon
Antoine Laurent
S. Meignier
38
17
0
09 Sep 2022
Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Bishal Lamichhane
N. Moukaddam
Ankit B. Patel
Ashutosh Sabharwal
14
1
0
08 Sep 2022
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free
Md. Iftekhar Tanveer
Diego Casabuena
Jussi Karlgren
Rosie Jones
BDL
11
4
0
25 Jul 2022
Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Zhiyun Fan
Linhao Dong
Meng Cai
Zejun Ma
Bo Xu
31
4
0
27 Jun 2022
DP-Parse: Finding Word Boundaries from Raw Speech with an Instance Lexicon
Robin Algayres
Tristan Ricoul
Julien Karadayi
Hugo Laurenccon
Salah Zaiem
Abdel-rahman Mohamed
Benoît Sagot
Emmanuel Dupoux
14
13
0
22 Jun 2022
Previous
1
2
3
Next