Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.02852
Cited By
v1
v2 (latest)
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
6 July 2021
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio"
39 / 39 papers shown
Title
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Thai-Binh Nguyen
Alexander Waibel
114
3
0
27 Nov 2024
End-to-End Speaker-Attributed ASR with Transformer
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
54
49
0
05 Apr 2021
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Naoyuki Kanda
Guoli Ye
Yu-Huan Wu
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
90
42
0
31 Mar 2021
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park
Naoyuki Kanda
Dimitrios Dimitriadis
Kyu Jeong Han
Shinji Watanabe
Shrikanth Narayanan
VLM
326
332
0
24 Jan 2021
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
Xuankai Chang
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
RALM
53
15
0
06 Jan 2021
VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge
Arsha Nagrani
Joon Son Chung
Jaesung Huh
Andrew Brown
Ernesto Coto
Weidi Xie
Mitchell McLaren
D. Reynolds
Andrew Zisserman
45
74
0
12 Dec 2020
Streaming end-to-end multi-talker speech recognition
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
58
43
0
26 Nov 2020
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar
A. Piunova
Yulan Liu
63
37
0
23 Nov 2020
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Naoyuki Kanda
Zhong Meng
Liang Lu
Yashesh Gaur
Xiaofei Wang
Zhuo Chen
Takuya Yoshioka
62
17
0
03 Nov 2020
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Desh Raj
Pavel Denisov
Zhuo Chen
Hakan Erdogan
Zili Huang
...
Yi Luo
Naoyuki Kanda
Jinyu Li
Scott Wisdom
J. Hershey
44
88
0
03 Nov 2020
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020
Xiong Xiao
Naoyuki Kanda
Zhuo Chen
Tianyan Zhou
Takuya Yoshioka
...
Yu-Huan Wu
Jian Wu
Shujie Liu
Jinyu Li
Jiawei Liu
58
63
0
22 Oct 2020
Continuous Speech Separation with Conformer
Sanyuan Chen
Yu-Huan Wu
Zhuo Chen
Jian Wu
Jinyu Li
Takuya Yoshioka
Chengyi Wang
Shujie Liu
M. Zhou
56
128
0
13 Aug 2020
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Naoyuki Kanda
Xuankai Chang
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
52
48
0
11 Aug 2020
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Tianyan Zhou
Takuya Yoshioka
53
77
0
19 Jun 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
224
3,139
0
16 May 2020
Speech Recognition and Multi-Speaker Diarization of Long Conversations
H. H. Mao
Shuyang Li
Julian McAuley
G. Cottrell
VLM
61
40
0
16 May 2020
Target-Speaker Voice Activity Detection: a Novel Approach for Multi-Speaker Diarization in a Dinner Party Scenario
Ivan Medennikov
M. Korenevsky
Tatiana Prisyach
Yuri Y. Khokhlov
Mariya Korenevskaya
...
Anton Mitrofanov
A. Andrusenko
Ivan Podluzhny
A. Laptev
A. Romanenko
45
199
0
14 May 2020
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Shinji Watanabe
Michael I. Mandel
Jon Barker
Emmanuel Vincent
Ashish Arora
...
Emmanuel Vincent
Shota Horiguchi
Naoyuki Kanda
Takuya Yoshioka
Neville Ryant
61
308
0
20 Apr 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
76
119
0
28 Mar 2020
Tackling real noisy reverberant meetings with all-neural source separation, counting, and diarization system
K. Kinoshita
Marc Delcroix
S. Araki
Tomohiro Nakatani
229
30
0
09 Mar 2020
Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap
Tae Jin Park
Kyu Jeong Han
Manoj Kumar
Shrikanth Narayanan
151
118
0
05 Mar 2020
End-to-End Multi-speaker Speech Recognition with Transformer
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
ViT
68
104
0
10 Feb 2020
Continuous speech separation: dataset and analysis
Zhuo Chen
Takuya Yoshioka
Liang Lu
Tianyan Zhou
Zhong Meng
Yi Luo
Jian Wu
Xiong Xiao
Jinyu Li
71
214
0
30 Jan 2020
Advances in Online Audio-Visual Meeting Transcription
Takuya Yoshioka
Igor Abramovski
Cem Aksoylar
Zhuo Chen
Moshe David
...
Huaming Wang
Zhenghao Wang
Jun Zhang
Yong Zhao
Tianyan Zhou
87
75
0
10 Dec 2019
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
71
120
0
15 Oct 2019
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models
Naoyuki Kanda
Shota Horiguchi
Yusuke Fujita
Yawen Xue
Kenji Nagamatsu
Shinji Watanabe
37
36
0
17 Sep 2019
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Laurent El Shafey
H. Soltau
Izhak Shafran
69
104
0
09 Jul 2019
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Naoyuki Kanda
Shota Horiguchi
R. Takashima
Yusuke Fujita
Kenji Nagamatsu
Shinji Watanabe
27
33
0
26 Jun 2019
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Naoyuki Kanda
Christoph Boeddeker
Jens Heitkaemper
Yusuke Fujita
Shota Horiguchi
Kenji Nagamatsu
Reinhold Häb-Umbach
56
62
0
29 May 2019
Res2Net: A New Multi-scale Backbone Architecture
Shanghua Gao
Ming-Ming Cheng
Kai Zhao
Xinyu Zhang
Ming-Hsuan Yang
Philip Torr
109
2,392
0
02 Apr 2019
End-to-End Monaural Multi-speaker ASR System without Pretraining
Xuankai Chang
Y. Qian
Yi Liang
Deming Chen
65
77
0
05 Nov 2018
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
356
2,279
0
14 Jun 2018
A Purely End-to-end System for Multi-speaker Speech Recognition
Hiroshi Seki
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
J. Hershey
44
89
0
15 May 2018
Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates
Taku Kudo
226
1,169
0
29 Apr 2018
The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
Jon Barker
Shinji Watanabe
Emmanuel Vincent
J. Trmal
59
683
0
28 Mar 2018
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
424
26,500
0
05 Sep 2017
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
127
2,274
0
26 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
713
131,652
0
12 Jun 2017
Recognizing Multi-talker Speech with Permutation Invariant Training
Dong Yu
Xuankai Chang
Y. Qian
54
93
0
22 Mar 2017
1