ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1907.05337
  4. Cited By
Joint Speech Recognition and Speaker Diarization via Sequence
  Transduction

Joint Speech Recognition and Speaker Diarization via Sequence Transduction

9 July 2019
Laurent El Shafey
H. Soltau
Izhak Shafran
ArXivPDFHTML

Papers citing "Joint Speech Recognition and Speaker Diarization via Sequence Transduction"

50 / 63 papers shown
Title
USED: Universal Speaker Extraction and Diarization
USED: Universal Speaker Extraction and Diarization
Junyi Ao
Mehmet Sinan Yildirim
Ruijie Tao
Mengyao Ge
Shuai Wang
Yan-min Qian
Haizhou Li
41
6
0
17 Jan 2025
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Anurag Kumar
Rohit Paturi
Amber Afshan
S. Srinivasan
43
0
0
14 Jan 2025
Reverb: Open-Source ASR and Diarization from Rev
Reverb: Open-Source ASR and Diarization from Rev
Nishchal Bhandari
Danny Chen
Miguel Ángel del Río Fernández
Natalie Delworth
Jennifer Drexler Fox
...
Ondrej Novotný
Jan Profant
Nan Qin
Martin Ratajczak
Jean-Philippe Robichaud
VLM
33
1
0
04 Oct 2024
Large Language Model Based Generative Error Correction: A Challenge and
  Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Chao-Han Huck Yang
Taejin Park
Yuan Gong
Yuanchao Li
Zhehuai Chen
...
E. Chng
Peter Bell
Catherine Lai
Shinji Watanabe
A. Stolcke
AuLLM
ELM
37
4
0
15 Sep 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by
  Bridging Timestamps and Tokens
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna C. Puvvada
Jagadeesh Balam
Boris Ginsburg
40
3
0
10 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for
  Multi-Speaker ASR
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna C. Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
40
2
0
02 Sep 2024
Speaker Tagging Correction With Non-Autoregressive Language Models
Speaker Tagging Correction With Non-Autoregressive Language Models
Grigor Kirakosyan
Davit Karamyan
3DV
33
0
0
30 Aug 2024
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR
Shashi Kumar
S. Madikeri
Juan Zuluaga-Gomez
Iuliia Nigmatulina
Esaú Villatoro-Tello
Sergio Burdisso
P. Motlícek
Karthik Pandia
A. Ganapathiraju
46
0
0
05 Jul 2024
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
AG-LSEC: Audio Grounded Lexical Speaker Error Correction
Rohit Paturi
Xiang Li
S. Srinivasan
36
1
0
25 Jun 2024
Joint vs Sequential Speaker-Role Detection and Automatic Speech
  Recognition for Air-traffic Control
Joint vs Sequential Speaker-Role Detection and Automatic Speech Recognition for Air-traffic Control
Alexander Blatt
Aravind Krishnan
Dietrich Klakow
27
0
0
19 Jun 2024
Unsupervised Speaker Diarization in Distributed IoT Networks Using
  Federated Learning
Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning
Amit Kumar Bhuyan
H. Dutta
Subir Biswas
FedML
29
1
0
16 Apr 2024
On Speaker Attribution with SURT
On Speaker Attribution with SURT
Desh Raj
Matthew Wiesner
Matthew Maciejewski
Leibny Paola García-Perera
Daniel Povey
Sanjeev Khudanpur
32
3
0
28 Jan 2024
DiarizationLM: Speaker Diarization Post-Processing with Large Language
  Models
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
Quan Wang
Yiling Huang
Guanlong Zhao
Evan Clark
Wei Xia
Hank Liao
AuLLM
33
8
0
07 Jan 2024
Frame-wise streaming end-to-end speaker diarization with
  non-autoregressive self-attention-based attractors
Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors
Di Liang
Nian Shao
Xiaofei Li
33
4
0
25 Sep 2023
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary
  Network
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Yiling Huang
Weiran Wang
Guanlong Zhao
Hank Liao
Wei Xia
Quan Wang
24
4
0
15 Sep 2023
Aligning Speakers: Evaluating and Visualizing Text-based Diarization
  Using Efficient Multiple Sequence Alignment (Extended Version)
Aligning Speakers: Evaluating and Visualizing Text-based Diarization Using Efficient Multiple Sequence Alignment (Extended Version)
Chen Gong
Peilin Wu
Jinho Choi
20
1
0
14 Sep 2023
Enhancing Speaker Diarization with Large Language Models: A Contextual
  Beam Search Approach
Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach
T. Park
Kunal Dhawan
Nithin Rao Koluguri
Jagadeesh Balam
42
15
0
11 Sep 2023
Text Injection for Capitalization and Turn-Taking Prediction in Speech
  Models
Text Injection for Capitalization and Turn-Taking Prediction in Speech Models
Shaan Bijwadia
Shuo-yiin Chang
Weiran Wang
Zhong Meng
Hao Zhang
Tara N. Sainath
24
1
0
14 Aug 2023
Lexical Speaker Error Correction: Leveraging Language Models for Speaker
  Diarization Error Correction
Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
Rohit Paturi
S. Srinivasan
Xiang Li
23
13
0
15 Jun 2023
An Experimental Review of Speaker Diarization methods with application
  to Two-Speaker Conversational Telephone Speech recordings
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
L. Serafini
Samuele Cornell
Giovanni Morrone
Enrico Zovato
A. Brutti
S. Squartini
47
9
0
29 May 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and
  Diarization with a Sidecar Separator
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
39
10
0
25 May 2023
Speaker Change Detection for Transformer Transducer ASR
Speaker Change Detection for Transformer Transducer ASR
Jian Wu
Zhuo Chen
Min Hu
Xiong Xiao
Jinyu Li
20
4
0
16 Feb 2023
Augmenting Transformer-Transducer Based Speaker Change Detection With
  Token-Level Training Loss
Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss
Guanlong Zhao
Quan Wang
Han Lu
Yiling Huang
Ignacio López Moreno
19
14
0
11 Nov 2022
The Conversational Short-phrase Speaker Diarization (CSSD) Task:
  Dataset, Evaluation Metric and Baselines
The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines
Gaofeng Cheng
Yifan Chen
Runyan Yang
Qingxu Li
Zehui Yang
...
Qingqing Zhang
Linfu Xie
Y. Qian
Kong Aik Lee
Yonghong Yan
13
9
0
17 Aug 2022
Extending RNN-T-based speech recognition systems with emotion and
  language classification
Extending RNN-T-based speech recognition systems with emotion and language classification
Zvi Kons
Hagai Aronowitz
E. Morais
Matheus Damasceno
H. Kuo
Samuel Thomas
G. Saon
14
5
0
28 Jul 2022
Unsupervised Speaker Diarization that is Agnostic to Language,
  Overlap-Aware, and Tuning Free
Unsupervised Speaker Diarization that is Agnostic to Language, Overlap-Aware, and Tuning Free
Md. Iftekhar Tanveer
Diego Casabuena
Jussi Karlgren
Rosie Jones
BDL
11
4
0
25 Jul 2022
End-to-end speech recognition modeling from de-identified data
End-to-end speech recognition modeling from de-identified data
M. Flechl
Shou-Chun Yin
Junho Park
Peter Skala
17
4
0
12 Jul 2022
A Multi-tasking Model of Speaker-Keyword Classification for Keeping
  Human in the Loop of Drone-assisted Inspection
A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection
Yu Li
Anisha Parsan
Bill Wang
Penghao Dong
Shanshan Yao
Ruwen Qin
29
5
0
08 Jul 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
13
26
0
30 Mar 2022
Towards Reducing the Need for Speech Training Data To Build Spoken
  Language Understanding Systems
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems
Samuel Thomas
H. Kuo
Brian Kingsbury
G. Saon
14
24
0
26 Feb 2022
Integrating Text Inputs For Training and Adapting RNN Transducer ASR
  Models
Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models
Samuel Thomas
Brian Kingsbury
G. Saon
H. Kuo
36
25
0
26 Feb 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
34
54
0
02 Feb 2022
Improving End-to-End Models for Set Prediction in Spoken Language
  Understanding
Improving End-to-End Models for Set Prediction in Spoken Language Understanding
H. Kuo
Zoltán Tüske
Samuel Thomas
Brian Kingsbury
G. Saon
21
0
0
28 Jan 2022
PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Takuya Yoshioka
Xiaofei Wang
Dongmei Wang
35
3
0
24 Jan 2022
Recent Advances in End-to-End Automatic Speech Recognition
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
35
363
0
02 Nov 2021
Speech Summarization using Restricted Self-Attention
Speech Summarization using Restricted Self-Attention
Roshan S. Sharma
Shruti Palaskar
A. Black
Florian Metze
30
33
0
12 Oct 2021
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection
  for Air Traffic Control Communications
BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications
Juan Pablo Zuluaga
Seyyed Saeed Sarfjoo
Amrutha Prasad
Iuliia Nigmatulina
P. Motlícek
Karel Ondrej
Oliver Ohneiser
H. Helmke
52
17
0
12 Oct 2021
Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer
  Transducer Speaker Turn Detection
Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection
Wei Xia
Han Lu
Quan Wang
Anshuman Tripathi
Yiling Huang
Ignacio López Moreno
Hasim Sak
41
51
0
23 Sep 2021
Integrating Dialog History into End-to-End Spoken Language Understanding
  Systems
Integrating Dialog History into End-to-End Spoken Language Understanding Systems
Jatin Ganhotra
Samuel Thomas
H. Kuo
Sachindra Joshi
G. Saon
Zoltán Tüske
Brian Kingsbury
30
10
0
18 Aug 2021
A Comparative Study of Modular and Joint Approaches for
  Speaker-Attributed ASR on Monaural Long-Form Audio
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
19
14
0
06 Jul 2021
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker
  Overlapped Speech Recognition and Speaker Attribute Estimation
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Ryo Masumura
Daiki Okamura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
25
7
0
04 Jul 2021
Towards Neural Diarization for Unlimited Numbers of Speakers Using
  Global and Local Attractors
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors
Shota Horiguchi
Shinji Watanabe
Leibny Paola García-Perera
Yawen Xue
Yuki Takashima
Y. Kawaguchi
36
37
0
04 Jul 2021
Pretext Tasks selection for multitask self-supervised speech
  representation learning
Pretext Tasks selection for multitask self-supervised speech representation learning
Salah Zaiem
Titouan Parcollet
S. Essid
Abdel Heba
SSL
14
12
0
01 Jul 2021
End-to-End Diarization for Variable Number of Speakers with Local-Global
  Networks and Discriminative Speaker Embeddings
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
Soumi Maiti
Hakan Erdogan
K. Wilson
Scott Wisdom
Shinji Watanabe
J. Hershey
27
21
0
05 May 2021
RNN Transducer Models For Spoken Language Understanding
RNN Transducer Models For Spoken Language Understanding
Samuel Thomas
H. Kuo
G. Saon
Zoltán Tüske
Brian Kingsbury
Gakuto Kurata
Zvi Kons
R. Hoory
19
14
0
08 Apr 2021
Understanding Medical Conversations: Rich Transcription, Confidence
  Scores & Information Extraction
Understanding Medical Conversations: Rich Transcription, Confidence Scores & Information Extraction
H. Soltau
Mingqiu Wang
Izhak Shafran
Laurent El Shafey
MedIm
LM&MA
20
12
0
06 Apr 2021
End-to-End Speaker-Attributed ASR with Transformer
End-to-End Speaker-Attributed ASR with Transformer
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
21
47
0
05 Apr 2021
Streaming Multi-talker Speech Recognition with Joint Speaker
  Identification
Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
20
19
0
05 Apr 2021
A Review of Speaker Diarization: Recent Advances with Deep Learning
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park
Naoyuki Kanda
Dimitrios Dimitriadis
Kyu Jeong Han
Shinji Watanabe
Shrikanth Narayanan
VLM
274
326
0
24 Jan 2021
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form
  Multi-talker Recordings
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
Xuankai Chang
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
RALM
11
15
0
06 Jan 2021
12
Next