ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.13971
  4. Cited By
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition

The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition

20 May 2025
Ming Gao
Shilong Wu
Hang Chen
Jun Du
Chin-Hui Lee
Shinji Watanabe
Jingdong Chen
Siniscalchi Sabato Marco
O. Scharenborg
ArXivPDFHTML

Papers citing "The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition"

18 / 18 papers shown
Title
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for
  Enhanced Time-Domain Monaural Speech Separation
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Shengkui Zhao
Yukun Ma
Chongjia Ni
Chong Zhang
Hao Wang
Trung Hieu Nguyen
Kun Zhou
J. Yip
Dianwen Ng
Bin Ma
67
26
0
19 Dec 2023
The Multimodal Information Based Speech Processing (MISP) 2023
  Challenge: Audio-Visual Target Speaker Extraction
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Shilong Wu
Chenxi Wang
Hang Chen
Yusheng Dai
Chenyue Zhang
...
Sabato Marco Siniscalchi
O. Scharenborg
Zhong-Qiu Wang
Jia Pan
Jianqing Gao
42
12
0
15 Sep 2023
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation
  Based Visual Pre-training and Cross-Modal Fusion Encoder
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder
Yusheng Dai
Hang Chen
Jun Du
xiao-ying Ding
Ning Ding
Feijun Jiang
Chin-Hui Lee
56
8
0
14 Aug 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and
  Multi-Dialect Corpus for Speech Representation Disentanglement
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Siqi Zheng
Luyao Cheng
Yafeng Chen
Haibo Wang
Qian Chen
50
19
0
27 Jun 2023
GPU-accelerated Guided Source Separation for Meeting Transcription
GPU-accelerated Guided Source Separation for Meeting Transcription
Desh Raj
Daniel Povey
Sanjeev Khudanpur
55
39
0
10 Dec 2022
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
178
3,655
0
06 Dec 2022
Paraformer: Fast and Accurate Parallel Transformer for
  Non-autoregressive End-to-End Speech Recognition
Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Zhifu Gao
Shiliang Zhang
Ian Mcloughlin
Zhijie Yan
40
106
0
16 Jun 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
239
1,857
0
26 Oct 2021
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription
  Challenge
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
37
113
0
14 Oct 2021
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech
  Recognition
WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Binbin Zhang
Hang Lv
Pengcheng Guo
Qijie Shao
Chao Yang
...
Hui Bu
Xiaoyu Chen
Chenchen Zeng
Di Wu
Zhendong Peng
59
228
0
07 Oct 2021
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation,
  Recognition and Speaker Diarization in Conference Scenario
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Yihui Fu
Luyao Cheng
Shubo Lv
Yukai Jv
Yuxiang Kong
...
Jian Wu
Hui Bu
Xin Xu
Jun Du
Jingdong Chen
50
92
0
08 Apr 2021
DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs
DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs
Desh Raj
Leibny Paola García-Perera
Zili Huang
Shinji Watanabe
Daniel Povey
A. Stolcke
Sanjeev Khudanpur
77
67
0
03 Nov 2020
Lip-reading with Densely Connected Temporal Convolutional Networks
Lip-reading with Densely Connected Temporal Convolutional Networks
Pingchuan Ma
Yujiang Wang
Jie Shen
Stavros Petridis
Maja Pantic
48
56
0
29 Sep 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
220
3,131
0
16 May 2020
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in
  TDNN Based Speaker Verification
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
Brecht Desplanques
Jenthe Thienpondt
Kris Demuynck
74
1,333
0
14 May 2020
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for
  Unsegmented Recordings
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Shinji Watanabe
Michael I. Mandel
Jon Barker
Emmanuel Vincent
Ashish Arora
...
Emmanuel Vincent
Shota Horiguchi
Naoyuki Kanda
Takuya Yoshioka
Neville Ryant
53
306
0
20 Apr 2020
Continuous speech separation: dataset and analysis
Continuous speech separation: dataset and analysis
Zhuo Chen
Takuya Yoshioka
Liang Lu
Tianyan Zhou
Zhong Meng
Yi Luo
Jian Wu
Xiong Xiao
Jinyu Li
63
214
0
30 Jan 2020
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
350
2,276
0
14 Jun 2018
1