pyannote.audio: neural building blocks for speaker diarization

4 November 2019

Papers citing "pyannote.audio: neural building blocks for speaker diarization"

44 / 144 papers shown

Title
The YiTrans End-to-End Speech Translation System for IWSLT 2022 Offline Shared Task Ziqiang Zhang Junyi Ao Long Zhou Shujie Liu Furu Wei Jinyu Li 22 9 0 12 Jun 2022
Speech Detection For Child-Clinician Conversations In Danish For Low-Resource In-The-Wild Conditions: A Case Study Sneha Das N. Lønfeldt A. Pagsberg Line H. Clemmensen 16 3 0 25 Apr 2022
Generative Spoken Dialogue Language Modeling Tu Nguyen Eugene Kharitonov Jade Copet Yossi Adi Wei-Ning Hsu ... Paden Tomasello Robin Algayres Benoît Sagot Abdel-rahman Mohamed Emmanuel Dupoux AuLLM 38 80 0 30 Mar 2022
Multi-scale Speaker Diarization with Dynamic Scale Weighting Tae Jin Park Nithin Rao Koluguri Jagadeesh Balam Boris Ginsburg 21 19 0 30 Mar 2022
Audio visual character profiles for detecting background characters in entertainment media Rahul Sharma Shrikanth Narayanan 17 5 0 21 Mar 2022
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach Dawei Liang Zifan Xu Yinuo Chen Rebecca Adaimi David Harwath Edison Thomaz 48 1 0 21 Mar 2022
DialogueNeRF: Towards Realistic Avatar Face-to-Face Conversation Video Generation Yichao Yan Zanwei Zhou Zi Wang Chen-Ning Yang Xiaokang Yang CVBM 21 19 0 15 Mar 2022
Magnitude-aware Probabilistic Speaker Embeddings Nikita Kuzmin Igor Fedorov A. Sholokhov 27 7 0 28 Feb 2022
The xmuspeech system for multi-channel multi-party meeting transcription challenge Jie Wang Yuji Liu Binling Wang Yiming Zhi Song Li Shipeng Xia Jiayang Zhang Lin Li Q. Hong Feng Tong 16 0 0 11 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge Fan Yu Shiliang Zhang Pengcheng Guo Yihui Fu Zhihao Du ... Kong Aik Lee Zhijie Yan B. Ma Xin Xu Hui Bu 18 28 0 08 Feb 2022
VoxSRC 2021: The Third VoxCeleb Speaker Recognition Challenge A. Brown Jaesung Huh Joon Son Chung Arsha Nagrani Daniel Garcia-Romero Andrew Zisserman 31 40 0 12 Jan 2022
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem Jing Shi Xuankai Chang Tomoki Hayashi Yen-Ju Lu Shinji Watanabe Bo Xu 32 19 0 17 Dec 2021
The People's Speech: A Large-Scale Diverse English Speech Recognition Dataset for Commercial Usage Daniel Galvez G. Diamos Juan Ciro Juan Felipe Cerón Keith Achorn Anjali Gopi David Kanter Maximilian Lam Mark Mazumder Vijay Janapa Reddi 22 95 0 17 Nov 2021
LiMuSE: Lightweight Multi-modal Speaker Extraction Qinghua Liu Yating Huang Yunzhe Hao Jiaming Xu Bo Xu 43 6 0 07 Nov 2021
TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context Nithin Rao Koluguri Taejin Park Boris Ginsburg ViT 33 94 0 08 Oct 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates Hirofumi Inaguma Siddharth Dalmia Brian Yan Shinji Watanabe 65 11 0 27 Sep 2021
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring Hirofumi Inaguma Yosuke Higuchi Kevin Duh Tatsuya Kawahara Shinji Watanabe 63 11 0 09 Sep 2021
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021 Jie Wang Fuchuan Tong Zhi-Cong Chen Lin Li Q. Hong Haodong Zhou 34 1 0 06 Sep 2021
The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021 Keke Wang Xudong Mao Hao Wu Chen Ding Chuxiang Shang Rui Xia Yuxuan Wang 20 13 0 05 Sep 2021
ESPnet-ST IWSLT 2021 Offline Speech Translation System Hirofumi Inaguma Shun Kiyono Nelson Enrique Yalta Soplin Pengcheng Guo Jun Suzuki Kevin Duh Shinji Watanabe 3DV 37 2 0 01 Jul 2021
SpeechBrain: A General-Purpose Speech Toolkit Mirco Ravanelli Titouan Parcollet Peter William VanHarn Plantinga Aku Rouhe Samuele Cornell ... William Aris Hwidong Na Yan Gao R. Mori Yoshua Bengio 24 752 0 08 Jun 2021
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings Soumi Maiti Hakan Erdogan K. Wilson Scott Wisdom Shinji Watanabe J. Hershey 27 21 0 05 May 2021
End-to-End Speech Recognition from Federated Acoustic Models Yan Gao Titouan Parcollet Salah Zaiem Javier Fernandez-Marques Pedro Porto Buarque de Gusmão Daniel J. Beutel Nicholas D. Lane 28 43 0 29 Apr 2021
End-to-end speaker segmentation for overlap-aware resegmentation H. Bredin Antoine Laurent VLM 209 163 0 08 Apr 2021
An Initial Investigation for Detecting Partially Spoofed Audio Lin Zhang Xin Wang Erica Cooper Junichi Yamagishi J. Patino Nicholas W. D. Evans 15 45 0 06 Apr 2021
Learning spectro-temporal representations of complex sounds with parameterized neural networks Rachid Riad Julien Karadayi Anne-Catherine Bachoud-Lévi Emmanuel Dupoux 29 7 0 12 Mar 2021
Incorporating VAD into ASR System by Multi-task Learning Meng Li Xiai Yan Feng Lin VLM 6 3 0 02 Mar 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap Shota Horiguchi Nelson Yalta Leibny Paola García-Perera Yuki Takashima Yawen Xue Desh Raj Zili Huang Yusuke Fujita Shinji Watanabe Sanjeev Khudanpur BDL 27 36 0 02 Feb 2021
Speech Enhancement for Wake-Up-Word detection in Voice Assistants David Bonet Guillermo Cámbara Fernando López Pablo Gómez Carlos Segura Jordi Luque 27 11 0 29 Jan 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation Changhan Wang M. Rivière Ann Lee Anne Wu Chaitanya Talnikar Daniel Haziza Mary Williamson J. Pino Emmanuel Dupoux SSL 25 462 0 02 Jan 2021
Bayesian HMM clustering of x-vector sequences (VBx) in speaker diarization: theory, implementation and analysis on standard tasks Federico Landini Jan Profant Mireia Díez L. Burget 216 199 0 29 Dec 2020
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge Arsha Nagrani Joon Son Chung Jaesung Huh Andrew Brown Ernesto Coto Weidi Xie Mitchell McLaren D. Reynolds Andrew Zisserman 21 74 0 12 Dec 2020
Comparison of Speaker Role Recognition and Speaker Enrollment Protocol for conversational Clinical Interviews Rachid Riad Hadrien Titeux Laurie Lemoine Justine Montillot A. Sliwinski J. Bagnou Xuan-Nga Cao Anne-Catherine Bachoud-Lévi Emmanuel Dupoux 15 0 0 30 Oct 2020
Speech Activity Detection Based on Multilingual Speech Recognition System Seyyed Saeed Sarfjoo S. Madikeri P. Motlícek 39 7 0 23 Oct 2020
Compositional embedding models for speaker identification and diarization with simultaneous speech from 2+ speakers Zeqian Li Jacob Whitehill 17 11 0 22 Oct 2020
Analysis of the BUT Diarization System for VoxConverse Challenge Federico Landini O. Glembek P. Matejka Johan Rohdin L. Burget Mireia Díez Anna Silnova 16 32 0 22 Oct 2020
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain Eugene Kharitonov M. Rivière Gabriel Synnaeve Lior Wolf Pierre-Emmanuel Mazaré Matthijs Douze Emmanuel Dupoux 25 117 0 02 Jul 2020
A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification Juan Manuel Coria H. Bredin Sahar Ghannay S. Rosset 23 15 0 31 Mar 2020
Cross modal video representations for weakly supervised active speaker localization Rahul Sharma Krishna Somandepalli Shrikanth Narayanan 19 8 0 09 Mar 2020
Seshat: A tool for managing and verifying annotation campaigns of audio data Hadrien Titeux Rachid Riad Xuan-Nga Cao Nicolas Hamilakis Kris Madden Alejandrina Cristià Anne-Catherine Bachoud-Lévi Emmanuel Dupoux VLM 8 7 0 03 Mar 2020
Speaker detection in the wild: Lessons learned from JSALT 2019 Leibny Paola García-Perera Jesus Villalba H. Bredin Jun Du Diego Castán ... Wassim Bouaziz Hadrien Titeux Emmanuel Dupoux Kong Aik Lee Najim Dehak 16 29 0 02 Dec 2019
The Speed Submission to DIHARD II: Contributions & Lessons Learned Md. Sahidullah J. Patino Samuele Cornell Ruiqing Yin S. Sivasankaran ... Emmanuel Vincent Nicholas W. D. Evans S´ebastien Marcel S. Squartini C. Barras VLM 14 16 0 06 Nov 2019
Overlap-aware diarization: resegmentation using neural end-to-end overlapped speech detection Latané Bullock H. Bredin Leibny Paola García-Perera 22 94 0 25 Oct 2019
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 266 2,238 0 14 Jun 2018