Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.01255
Cited By
pyannote.audio: neural building blocks for speaker diarization
4 November 2019
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
Re-assign community
ArXiv
PDF
HTML
Papers citing
"pyannote.audio: neural building blocks for speaker diarization"
50 / 144 papers shown
Title
Can Language Models Understand Social Behavior in Clinical Conversations?
Manas Satish Bedmutha
Feng Chen
Andrea Hartzler
Trevor Cohen
Nadir Weibel
LM&MA
AI4MH
43
0
0
07 May 2025
Automatic Proficiency Assessment in L2 English Learners
Armita Mohammadi
Alessandro Lameiras Koerich
Laureano Moro-Velazquez
P. Cardinal
30
0
0
05 May 2025
Co
3
^{3}
3
Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
66
0
0
03 May 2025
Early Exit and Multi Stage Knowledge Distillation in VLMs for Video Summarization
Anas Anwarul Haq Khan
Utkarsh Verma
Prateek Chanda
Ganesh Ramakrishnan
VLM
53
0
0
30 Apr 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark J. F. Gales
46
0
0
26 Apr 2025
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Siddhant Arora
Zhiyun Lu
Chung-Cheng Chiu
Ruoming Pang
Shinji Watanabe
43
2
0
03 Mar 2025
Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond
Mardhiyah Sanni
Tassallah Abdullahi
Devendra D. Kayande
Emmanuel Ayodele
Naome A. Etori
...
Chibuzor Okocha
L. Ismaila
Folafunmi Omofoye
Boluwatife A. Adewale
Tobi Olatunji
89
1
0
06 Feb 2025
Guided Speaker Embedding
Shota Horiguchi
Takafumi Moriya
Atsushi Ando
Takanori Ashihara
Hiroshi Sato
Naohiro Tawara
Marc Delcroix
45
0
0
03 Jan 2025
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment
Firdavs Nasriddinov
Rafal Kocielnik
Arushi Gupta
Cherine Yang
Elyssa Y. Wong
Anima Anandkumar
Andrew J. Hung
68
0
0
01 Dec 2024
Titanic Calling: Low Bandwidth Video Conference from the Titanic Wreck
Fevziye Irem Eyiokur
Christian Huber
Thai-Binh Nguyen
T. Nguyen
Fabian Retkowski
Enes Yavuz Ugan
Dogucan Yaman
Alexander Waibel
27
0
0
15 Oct 2024
Casablanca: Data and Models for Multidialectal Arabic Speech Recognition
Bashar Talafha
Karima Kadaoui
Samar Magdy
Mariem Habiboullah
Chafei Mohamed Chafei
...
Yousra Berrachedi
Mustafa Jarrar
Shady Shehata
Ismail Berrada
Muhammad Abdul-Mageed
35
3
0
06 Oct 2024
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Bandhav Veluri
Benjamin Peloquin
Bokai Yu
Hongyu Gong
Shyamnath Gollakota
AuLLM
OffRL
48
13
0
23 Sep 2024
oboVox Far Field Speaker Recognition: A Novel Data Augmentation Approach with Pretrained Models
Muhammad Sudipto Siam Dip
Md Anik Hasan
Sapnil Sarker Bipro
Md Abdur Raiyan
M. A. Motin
29
0
0
16 Sep 2024
The VoxCeleb Speaker Recognition Challenge: A Retrospective
Jaesung Huh
Joon Son Chung
Arsha Nagrani
A. Brown
Jee-weon Jung
Daniel Garcia-Romero
Andrew Zisserman
36
3
0
27 Aug 2024
LiveFC: A System for Live Fact-Checking of Audio Streams
Venktesh V
Vinay Setty
28
3
0
14 Aug 2024
Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation
Yinghao Aaron Li
Xilin Jiang
Jordan Darefsky
Ge Zhu
N. Mesgarani
34
2
0
13 Aug 2024
An approach to optimize inference of the DIART speaker diarization pipeline
Roman Aperdannier
Sigurd Schacht
Alexander Piazza
35
0
0
05 Aug 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
29
9
0
23 Jul 2024
Investigating Confidence Estimation Measures for Speaker Diarization
Anurag Chowdhury
Abhinav Misra
Mark C. Fuhs
Monika Woszczyna
24
0
0
24 Jun 2024
Song Data Cleansing for End-to-End Neural Singer Diarization Using Neural Analysis and Synthesis Framework
Hokuto Munakata
Ryo Terashima
Yusuke Fujita
31
0
0
24 Jun 2024
SCDNet: Self-supervised Learning Feature-based Speaker Change Detection
Yue Li
Xinsheng Wang
Li Zhang
Lei Xie
37
1
0
12 Jun 2024
LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis
Vijay Yadav
Anzar Abbas
41
3
0
07 Jun 2024
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
Théo Mariotte
Anthony Larcher
Silvio Montrésor
Jean-Hugh Thomas
27
0
0
05 Jun 2024
Estimating Speech Duration by Measuring the Abdominal Movement Using a Barometric Sensor
Rintaro Katagiri
Yutaka Arakawa
Yugo Nakamura
28
0
0
10 May 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
32
10
0
28 Apr 2024
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification
Rémi Uro
D. Doukhan
Albert Rilliard
Laëtitia Larcher
Anissa-Claire Adgharouamane
Marie Tahon
Antoine Laurent
39
4
0
26 Apr 2024
Listen Then See: Video Alignment with Speaker Attention
Aviral Agrawal
Carlos Mateo Samudio Lezcano
Iqui Balam Heredia-Marin
P. Sethi
28
2
0
21 Apr 2024
Speech-Aware Neural Diarization with Encoder-Decoder Attractor Guided by Attention Constraints
PeiYing Lee
HauYun Guo
Berlin Chen
34
0
0
21 Mar 2024
MIntRec2.0: A Large-scale Benchmark Dataset for Multimodal Intent Recognition and Out-of-scope Detection in Conversations
Hanlei Zhang
Xin Wang
Hua Xu
Qianrui Zhou
Kai Gao
Jianhua Su
jinyue Zhao
Wenrui Li
Yanting Chen
32
2
0
16 Mar 2024
REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild
Jose Vargas-Quiros
Chirag Raman
Stephanie Tan
Ekin Gedik
Laura Cabrera-Quiros
Hayley Hung
24
3
0
02 Mar 2024
Supporting Experts with a Multimodal Machine-Learning-Based Tool for Human Behavior Analysis of Conversational Videos
Riku Arakawa
Kiyosu Maeda
Hiromu Yakura
20
3
0
17 Feb 2024
Channel-Combination Algorithms for Robust Distant Voice Activity and Overlapped Speech Detection
Théo Mariotte
Anthony Larcher
Silvio Montrésor
Jean-Hugh Thomas
16
2
0
13 Feb 2024
Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder
Tahiya Chowdhury
Verónica Romero
Amanda Stent
21
3
0
18 Jan 2024
Reading Between the Frames: Multi-Modal Depression Detection in Videos from Non-Verbal Cues
David Gimeno-Gómez
Ana-Maria Bucur
Adrian Cosma
Carlos David Martínez Hinarejos
Paolo Rosso
30
11
0
05 Jan 2024
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
27
1
0
18 Dec 2023
Path Signature Representation of Patient-Clinician Interactions as a Predictor for Neuropsychological Tests Outcomes in Children: A Proof of Concept
Giulio Falcioni
A. Georgescu
Emilia Molimpakis
Lev Gottlieb
Taylor Kuhn
S. Goria
11
1
0
12 Dec 2023
LaCour!: Enabling Research on Argumentation in Hearings of the European Court of Human Rights
Lena Held
Ivan Habernal
AILaw
25
0
0
08 Dec 2023
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
Federico Landini
Mireia Díez
Themos Stafylakis
Lukávs Burget
25
11
0
07 Dec 2023
Deep Multimodal Fusion for Surgical Feedback Classification
Rafal Kocielnik
Elyssa Y. Wong
Timothy N. Chu
Lydia Lin
De-An Huang
Jiayun Wang
A. Anandkumar
Andrew J. Hung
27
2
0
06 Dec 2023
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words
Lukas Wolf
Greta Tuckute
Klemen Kotar
Eghbal Hosseini
Tamar I. Regev
Ethan Gotlieb Wilcox
Alex Warstadt
41
3
0
05 Dec 2023
Summary of the DISPLACE Challenge 2023 - DIarization of SPeaker and LAnguage in Conversational Environments
Shikha Baghel
Shreyas Ramoji
Somil Jain
Pratik Roy Chowdhuri
Prachi Singh
Deepu Vijayasenan
Sriram Ganapathy
12
6
0
21 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
25
2
0
01 Nov 2023
Show from Tell: Audio-Visual Modelling in Clinical Settings
Jianbo Jiao
M. Alsharid
L. Drukker
A. Papageorghiou
Andrew Zisserman
J. A. Noble
14
0
0
25 Oct 2023
Powerset multi-class cross entropy loss for neural speaker diarization
Alexis Plaquet
H. Bredin
99
91
0
19 Oct 2023
Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews
Armin Haberl
Jürgen Fleiß
Dominik Kowald
Stefan Thalmann
8
3
0
18 Oct 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
Piyush Singh Pasi
Karthikeya Battepati
P. Jyothi
Ganesh Ramakrishnan
T. Mahapatra
Manoj Singh
51
0
0
10 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
36
1
0
08 Oct 2023
Does My Dog ''Speak'' Like Me? The Acoustic Correlation between Pet Dogs and Their Human Owners
Jieyi Huang
Chunhao Zhang
Yufei Wang
Mengyue Wu
Ke Zhu
6
0
0
21 Sep 2023
Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition
Ahmed Amine Ben Abdallah
Ata Kabboudi
Amir Kanoun
Salah Zaiem
14
1
0
20 Sep 2023
1
2
3
Next