ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.11889
  4. Cited By
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
v1v2 (latest)

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper

13 March 2025
Jiaming Zhou
Songtao Zhao
Jiabei He
Hui Wang
Wenjia Zeng
Yong Chen
Haoqin Sun
Aobo Kong
Yong Qin
ArXiv (abs)PDFHTML

Papers citing "M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper"

32 / 32 papers shown
Title
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Improving Zero-Shot Chinese-English Code-Switching ASR with kNN-CTC and Gated Monolingual Datastores
Jiaming Zhou
Songtao Zhao
Hui Wang
Tian-Hao Zhang
Haoqin Sun
Xuechen Wang
Yong Qin
222
3
0
20 Jan 2025
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic
  Speech Recognition Challenge
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Hongfei Xue
Rong Gong
Mingchen Shao
Xin Xu
L. xilinx Wang
...
Yong Qin
Jun Du
Ming Li
Binbin Zhang
Bin Jia
60
2
0
09 Sep 2024
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word
  Spotting Challenge
PB-LRDWWS System for the SLT 2024 Low-Resource Dysarthria Wake-Up Word Spotting Challenge
Shiyao Wang
Jiaming Zhou
Shiwan Zhao
Yong Qin
68
1
0
07 Sep 2024
Enhancing Dysarthric Speech Recognition for Unseen Speakers via
  Prototype-Based Adaptation
Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation
Shiyao Wang
Shiwan Zhao
Jiaming Zhou
Aobo Kong
Yong Qin
95
5
0
26 Jul 2024
AS-70: A Mandarin stuttered speech dataset for automatic speech
  recognition and stuttering event detection
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
Rong Gong
Hongfei Xue
L. xilinx Wang
Xin Xu
Qisheng Li
...
Yong Qin
Binbin Zhang
Jun Du
Jia Bin
Ming Li
78
8
0
11 Jun 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and
  Dialogue Abilities
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Ming-Yu Liu
Rafael Valle
Bryan Catanzaro
AuLLMLM&MAMLLM
143
94
0
02 Feb 2024
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
kNN-CTC: Enhancing ASR via Retrieval of CTC Pseudo Labels
Jiaming Zhou
Shiwan Zhao
Yaqi Liu
Wenjia Zeng
Yong Chen
Yong Qin
83
10
0
21 Dec 2023
SALM: Speech-augmented Language Model with In-context Learning for
  Speech Recognition and Translation
SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation
Zhehuai Chen
He Huang
A. Andrusenko
Oleksii Hrinchuk
Krishna C. Puvvada
Jason Chun Lok Li
Subhankar Ghosh
Jagadeesh Balam
Boris Ginsburg
LRM
81
58
0
13 Oct 2023
Can Whisper perform speech-based in-context learning?
Can Whisper perform speech-based in-context learning?
Siyin Wang
Chao-Han Huck Yang
Ji Wu
Chao Zhang
103
29
0
13 Sep 2023
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic
  Weighting
RAMP: Retrieval-Augmented MOS Prediction via Confidence-based Dynamic Weighting
Haibo Wang
Shiwan Zhao
Xiguang Zheng
Yong Qin
71
13
0
31 Aug 2023
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech
  Recognition
CIF-T: A Novel CIF-based Transducer Architecture for Automatic Speech Recognition
Tian-Hao Zhang
Dinghao Zhou
Guiping Zhong
Jiaming Zhou
Baoxiang Li
73
3
0
26 Jul 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
157
358
0
22 May 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
71
170
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
136
270
0
02 Mar 2023
MADI: Inter-domain Matching and Intra-domain Discrimination for
  Cross-domain Speech Recognition
MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition
Jiaming Zhou
Shiwan Zhao
Ning Jiang
Guoqing Zhao
Yong Qin
102
7
0
22 Feb 2023
Why do Nearest Neighbor Language Models Work?
Why do Nearest Neighbor Language Models Work?
Frank F. Xu
Uri Alon
Graham Neubig
RALM
54
23
0
07 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
216
3,757
0
06 Dec 2022
In-context Examples Selection for Machine Translation
In-context Examples Selection for Machine Translation
Sweta Agrawal
Chunting Zhou
M. Lewis
Luke Zettlemoyer
Marjan Ghazvininejad
LRM
94
198
0
05 Dec 2022
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
143
542
0
12 Nov 2022
Boosting Cross-Domain Speech Recognition with Self-Supervision
Boosting Cross-Domain Speech Recognition with Self-Supervision
Hanjing Zhu
Gaofeng Cheng
Jindong Wang
Wenxin Hou
Pengyuan Zhang
Yonghong Yan
93
16
0
20 Jun 2022
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse
  Data
Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data
Alena Aksenova
Zhehuai Chen
Chung-Cheng Chiu
D. Esch
Pavel Golik
...
Levi King
Bhuvana Ramabhadran
Andrew Rosenberg
Suzan Schwartz
Gary Wang
100
23
0
16 May 2022
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen
  Language Models
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Heting Gao
Junrui Ni
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
VLM
161
31
0
29 Mar 2022
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit
Binbin Zhang
Di Wu
Zhendong Peng
Xingcheng Song
Zhuoyuan Yao
Hang Lv
Linfu Xie
Chao Yang
Fuping Pan
Jianwei Niu
VLM
85
98
0
29 Mar 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning
  Work?
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAGLRM
191
1,501
0
25 Feb 2022
Learning To Retrieve Prompts for In-Context Learning
Learning To Retrieve Prompts for In-Context Learning
Ohad Rubin
Jonathan Herzig
Jonathan Berant
VPVLMRALM
88
709
0
16 Dec 2021
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent
  Adaptation
DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation
Suraj Kothawade
Anmol Reddy Mekala
D. ChandraSekhara
Mayank Kothyari
Rishabh K. Iyer
Ganesh Ramakrishnan
Preethi Jyothi
62
5
0
10 Oct 2021
Nearest Neighbor Machine Translation
Nearest Neighbor Machine Translation
Urvashi Khandelwal
Angela Fan
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
73
286
0
01 Oct 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,164
0
16 May 2020
Generalization through Memorization: Nearest Neighbor Language Models
Generalization through Memorization: Nearest Neighbor Language Models
Urvashi Khandelwal
Omer Levy
Dan Jurafsky
Luke Zettlemoyer
M. Lewis
RALM
177
842
0
01 Nov 2019
ESPnet: End-to-End Speech Processing Toolkit
ESPnet: End-to-End Speech Processing Toolkit
Shinji Watanabe
Takaaki Hori
Shigeki Karita
Tomoki Hayashi
Jiro Nishitoba
...
Jahn Heymann
Sanjeev Khudanpur
Nanxin Chen
Adithya Renduchintala
Tsubasa Ochiai
VLM
122
1,515
0
30 Mar 2018
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech
  Recognition Baseline
AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline
Hui Bu
Jiayu Du
Xingyu Na
Bengu Wu
Hao Zheng
CVBM
76
845
0
16 Sep 2017
Billion-scale similarity search with GPUs
Billion-scale similarity search with GPUs
Jeff Johnson
Matthijs Douze
Hervé Jégou
257
3,741
0
28 Feb 2017
1