Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.07677
Cited By
v1
v2 (latest)
Masked Audio Text Encoders are Effective Multi-Modal Rescorers
11 May 2023
Jason (Jinglun) Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
S. Bodapati
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Audio Text Encoders are Effective Multi-Modal Rescorers"
40 / 40 papers shown
Title
KG-ECO: Knowledge Graph Enhanced Entity Correction for Query Rewriting
Jason (Jinglun) Cai
Mingda Li
Ziyan Jiang
Eunah Cho
Zheng Chen
Yang Liu
Xing Fan
Chenlei Guo
73
7
0
21 Feb 2023
Improving Deliberation by Text-Only and Semi-Supervised Training
Ke Hu
Tara N. Sainath
Yanzhang He
Rohit Prabhavalkar
Trevor Strohman
S. Mavandadi
Weiran Wang
64
13
0
29 Jun 2022
MAESTRO: Matched Speech Text Representations through Modality Matching
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Pedro J. Moreno
Ankur Bapna
Heiga Zen
70
108
0
07 Apr 2022
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems
Takuma Udagawa
Masayuki Suzuki
Gakuto Kurata
N. Itoh
G. Saon
99
24
0
01 Apr 2022
WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Heting Gao
Junrui Ni
Kaizhi Qian
Yang Zhang
Shiyu Chang
M. Hasegawa-Johnson
VLM
151
31
0
29 Mar 2022
RescoreBERT: Discriminative Speech Recognition Rescoring with BERT
Liyan Xu
Yile Gu
J. Kolehmainen
Haidar Khan
Ankur Gandhe
Ariya Rastrow
A. Stolcke
I. Bulyko
94
48
0
02 Feb 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
265
1,898
0
26 Oct 2021
SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training
Ankur Bapna
Yu-An Chung
Na Wu
Anmol Gulati
Ye Jia
J. Clark
Melvin Johnson
Jason Riesa
Alexis Conneau
Yu Zhang
VLM
103
96
0
20 Oct 2021
Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition
Guolin Zheng
Yubei Xiao
Ke Gong
Pan Zhou
Xiaodan Liang
Liang Lin
67
26
0
19 Sep 2021
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
183
788
0
25 Jun 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
184
2,993
0
14 Jun 2021
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Yichong Leng
Xu Tan
Linchen Zhu
Jin Xu
Renqian Luo
Linquan Liu
Tao Qin
Xiang-Yang Li
Ed Lin
Tie-Yan Liu
KELM
68
64
0
09 May 2021
SUPERB: Speech processing Universal PERformance Benchmark
Shu-Wen Yang
Po-Han Chi
Yung-Sung Chuang
Cheng-I Jeff Lai
Kushal Lakhotia
...
Shuyan Dong
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
SSL
108
941
0
03 May 2021
Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation for BERT Rankers
Navid Rekabsaz
Simone Kopeinik
Markus Schedl
64
62
0
28 Apr 2021
Transformer Based Deliberation for Two-Pass Speech Recognition
Ke Hu
Ruoming Pang
Tara N. Sainath
Trevor Strohman
52
38
0
27 Jan 2021
VoxPopuli: A Large-Scale Multilingual Speech Corpus for Representation Learning, Semi-Supervised Learning and Interpretation
Changhan Wang
M. Rivière
Ann Lee
Anne Wu
Chaitanya Talnikar
Daniel Haziza
Mary Williamson
J. Pino
Emmanuel Dupoux
SSL
102
492
0
02 Jan 2021
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
84
406
0
14 Dec 2020
SLURP: A Spoken Language Understanding Resource Package
E. Bastianelli
Andrea Vanzo
P. Swietojanski
Verena Rieser
VLM
93
231
0
26 Nov 2020
Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment
Ethan A. Chi
Julian Salazar
Katrin Kirchhoff
AI4TS
75
52
0
24 Oct 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Alexei Baevski
Henry Zhou
Abdel-rahman Mohamed
Michael Auli
SSL
297
5,837
0
20 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
877
42,379
0
28 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,155
0
16 May 2020
Deliberation Model Based Two-Pass End-to-End Speech Recognition
Ke Hu
Tara N. Sainath
Ruoming Pang
Rohit Prabhavalkar
81
87
0
17 Mar 2020
A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Erik McDermott
Hasim Sak
Ehsan Variani
57
113
0
26 Feb 2020
Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss
Qian Zhang
Han Lu
Hasim Sak
Anshuman Tripathi
Erik McDermott
Stephen Koo
Shankar Kumar
92
481
0
07 Feb 2020
Audio-attention discriminative language model for ASR rescoring
Ankur Gandhe
Ariya Rastrow
46
24
0
06 Dec 2019
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations
Alexei Baevski
Steffen Schneider
Michael Auli
SSL
163
667
0
12 Oct 2019
Two-Pass End-to-End Speech Recognition
Tara N. Sainath
Ruoming Pang
David Rybach
Yanzhang He
Rohit Prabhavalkar
...
Qiao Liang
Trevor Strohman
Yonghui Wu
Ian McGraw
Chung-Cheng Chiu
80
148
0
29 Aug 2019
BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model
Alex Jinpeng Wang
Kyunghyun Cho
VLM
88
358
0
11 Feb 2019
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
221
4,514
0
02 Feb 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,175
0
11 Oct 2018
Cold Fusion: Training Seq2Seq Models Together with Language Models
Anuroop Sriram
Heewoo Jun
S. Satheesh
Adam Coates
VLM
90
281
0
21 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
786
132,363
0
12 Jun 2017
End-to-End Attention-based Large Vocabulary Speech Recognition
Dzmitry Bahdanau
J. Chorowski
Dmitriy Serdyuk
Philemon Brakel
Yoshua Bengio
89
1,152
0
18 Aug 2015
Listen, Attend and Spell
William Chan
Navdeep Jaitly
Quoc V. Le
Oriol Vinyals
RALM
156
2,269
0
05 Aug 2015
EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding
Yajie Miao
M. Gowayyed
Florian Metze
100
755
0
29 Jul 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
Deep Speech: Scaling up end-to-end speech recognition
Awni Y. Hannun
Carl Case
Jared Casper
Bryan Catanzaro
G. Diamos
...
R. Prenger
S. Satheesh
Shubho Sengupta
Adam Coates
A. Ng
188
2,128
0
17 Dec 2014
End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results
J. Chorowski
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
103
471
0
04 Dec 2014
Sequence Transduction with Recurrent Neural Networks
Alex Graves
195
1,871
0
14 Nov 2012
1