ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.01374
  4. Cited By
mSLAM: Massively multilingual joint pre-training for speech and text

mSLAM: Massively multilingual joint pre-training for speech and text

3 February 2022
Ankur Bapna
Colin Cherry
Yu Zhang
Ye Jia
Melvin Johnson
Yong Cheng
Simran Khanuja
Jason Riesa
Alexis Conneau
    VLM
ArXivPDFHTML

Papers citing "mSLAM: Massively multilingual joint pre-training for speech and text"

50 / 87 papers shown
Title
Exploiting Vulnerabilities in Speech Translation Systems through Targeted Adversarial Attacks
Chang-rui Liu
Haolin Wu
Xi Yang
Kui Zhang
Cong Wu
Wenqi Zhang
Nenghai Yu
Tianwei Zhang
Qing-Wu Guo
Jie Zhang
AAML
34
0
0
02 Mar 2025
Graph Perceiver IO: A General Architecture for Graph Structured Data
Graph Perceiver IO: A General Architecture for Graph Structured Data
Seyun Bae
Hoyoon Byun
Changdae Oh
Yoon-Sik Cho
Kyungwoo Song
GNN
98
2
0
24 Feb 2025
STORM: Strategic Orchestration of Modalities for Rare Event
  Classification
STORM: Strategic Orchestration of Modalities for Rare Event Classification
Payal Kamboj
Ayan Banerjee
Sandeep K. S. Gupta
69
1
0
03 Dec 2024
BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages
BhasaAnuvaad: A Speech Translation Dataset for 13 Indian Languages
Sparsh Jain
Ashwin Sankar
Devilal Choudhary
Dhairya Suman
Nikhil Narasimhan
Mohammed Safi Ur Rahman Khan
Anoop Kunchukuttan
Mitesh M. Khapra
Raj Dabre
42
2
0
07 Nov 2024
EMMeTT: Efficient Multimodal Machine Translation Training
EMMeTT: Efficient Multimodal Machine Translation Training
Piotr Żelasko
Zhehuai Chen
Mengru Wang
Daniel Galvez
Oleksii Hrinchuk
Shuoyang Ding
Ke Hu
Jagadeesh Balam
Vitaly Lavrukhin
Boris Ginsburg
38
1
0
20 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
37
2
0
05 Sep 2024
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text
  Interleaving
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Bhavani Shankar
P. Jyothi
Pushpak Bhattacharyya
48
1
0
16 Jun 2024
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
ASTRA: Aligning Speech and Text Representations for Asr without Sampling
Neeraj Gaur
Rohan Agrawal
Gary Wang
Parisa Haghani
Andrew Rosenberg
Bhuvana Ramabhadran
42
0
0
10 Jun 2024
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with
  LLMs for Multi-modal Text Recognition
Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition
Chan-Jan Hsu
Yi-Chang Chen
Feng-Ting Liao
Pei-Chen Ho
Yu-Hsiang Wang
Po-Chun Hsu
Da-shan Shiu
31
2
0
23 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
37
0
14 May 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
41
4
0
02 Apr 2024
Extending Multilingual Speech Synthesis to 100+ Languages without
  Transcribed Data
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data
Takaaki Saeki
Gary Wang
Nobuyuki Morioka
Isaac Elias
Kyle Kastner
...
Andrew Rosenberg
Bhuvana Ramabhadran
Heiga Zen
Francoise Beaufays
Hadar Shemtov
38
13
0
29 Feb 2024
Efficient Adapter Finetuning for Tail Languages in Streaming
  Multilingual ASR
Efficient Adapter Finetuning for Tail Languages in Streaming Multilingual ASR
Junwen Bai
Bo-wen Li
Qiujia Li
Tara N. Sainath
Trevor Strohman
38
3
0
17 Jan 2024
Order Matters in the Presence of Dataset Imbalance for Multilingual
  Learning
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Dami Choi
Derrick Xin
Hamid Dadkhahi
Justin Gilmer
Ankush Garg
Orhan Firat
Chih-Kuan Yeh
Andrew M. Dai
Behrooz Ghorbani
55
3
0
11 Dec 2023
A Quantitative Approach to Understand Self-Supervised Models as
  Cross-lingual Feature Extractors
A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors
Shuyue Stella Li
Beining Xu
Xiangyu Zhang
Hexin Liu
Wen-Han Chao
Leibny Paola García
SSL
34
4
0
27 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
35
2
0
01 Nov 2023
Toward Joint Language Modeling for Speech Units and Text
Toward Joint Language Modeling for Speech Units and Text
Ju-Chieh Chou
Chung-Ming Chien
Wei-Ning Hsu
Karen Livescu
Arun Babu
Alexis Conneau
Alexei Baevski
Michael Auli
VLM
26
20
0
12 Oct 2023
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Few-Shot Spoken Language Understanding via Joint Speech-Text Models
Chung-Ming Chien
Mingjiamei Zhang
Ju-Chieh Chou
Karen Livescu
34
3
0
09 Oct 2023
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer
Paul-Ambroise Duquenne
Holger Schwenk
Benoît Sagot
42
3
0
05 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
39
56
0
30 Sep 2023
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard
  Parameter Sharing
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing
B. Grimstad
Xuankai Chang
Antonios Anastasopoulos
Yuya Fujita
Shinji Watanabe
26
2
0
27 Sep 2023
Multimodal Modeling For Spoken Language Identification
Multimodal Modeling For Spoken Language Identification
Shikhar Bharadwaj
Min Ma
Shikhar Vashishth
Ankur Bapna
Sriram Ganapathy
...
Yu Zhang
D. Esch
Sandy Ritchie
Partha P. Talukdar
Jason Riesa
30
0
0
19 Sep 2023
Direct Text to Speech Translation System using Acoustic Units
Direct Text to Speech Translation System using Acoustic Units
Victoria Mingote
Pablo Gimeno
Luis Vicente
Sameer Khurana
Antoine Laurent
J. Duret
28
3
0
14 Sep 2023
Using Text Injection to Improve Recognition of Personal Identifiers in
  Speech
Using Text Injection to Improve Recognition of Personal Identifiers in Speech
Yochai Blau
Rohan Agrawal
Lior Madmony
Gary Wang
Andrew Rosenberg
Zhehuai Chen
Zorik Gekhman
Genady Beryozkin
Parisa Haghani
Bhuvana Ramabhadran
46
3
0
14 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Improving Joint Speech-Text Representations Without Alignment
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Kyunghyun Cho
VLM
31
4
0
11 Aug 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
35
259
0
22 Jun 2023
Recent Advances in Direct Speech-to-text Translation
Recent Advances in Direct Speech-to-text Translation
Chen Xu
Rong Ye
Qianqian Dong
Chengqi Zhao
Tom Ko
Mingxuan Wang
Tong Xiao
Jingbo Zhu
21
18
0
20 Jun 2023
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture
  Linguistic Knowledge?
SpeechGLUE: How Well Can Self-Supervised Speech Models Capture Linguistic Knowledge?
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
Yusuke Ijima
Taichi Asami
Marc Delcroix
Yukinori Honma
SSL
ELM
27
11
0
14 Jun 2023
Efficient Adapters for Giant Speech Models
Efficient Adapters for Giant Speech Models
Nanxin Chen
Izhak Shafran
Yu Zhang
Chung-Cheng Chiu
H. Soltau
James Qin
Yonghui Wu
22
10
0
13 Jun 2023
Text-only Domain Adaptation using Unified Speech-Text Representation in
  Transducer
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Lu Huang
Yangqiu Song
Jun Zhang
Lu Lu
Zejun Ma
29
2
0
07 Jun 2023
Improved Cross-Lingual Transfer Learning For Automatic Speech
  Translation
Improved Cross-Lingual Transfer Learning For Automatic Speech Translation
Sameer Khurana
Nauman Dawalatabad
Antoine Laurent
Luis Vicente
Pablo Gimeno
Victoria Mingote
James R. Glass
VLM
20
1
0
01 Jun 2023
Simple yet Effective Code-Switching Language Identification with
  Multitask Pre-Training and Transfer Learning
Simple yet Effective Code-Switching Language Identification with Multitask Pre-Training and Transfer Learning
Shuyue Stella Li
Cihan Xiao
Tianjian Li
Bismarck Odoom
28
3
0
31 May 2023
Translatotron 3: Speech to Speech Translation with Monolingual Data
Translatotron 3: Speech to Speech Translation with Monolingual Data
Eliya Nachmani
Alon Levkovitch
Yi-Yang Ding
Chulayutsh Asawaroengchai
Heiga Zen
Michelle Tadmor Ramanovich
23
14
0
27 May 2023
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text
  Translation
ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Chenyang Le
Yao Qian
Long Zhou
Shujie Liu
Yanmin Qian
Michael Zeng
Xuedong Huang
24
13
0
24 May 2023
Scaling Speech Technology to 1,000+ Languages
Scaling Speech Technology to 1,000+ Languages
Vineel Pratap
Andros Tjandra
Bowen Shi
Paden Tomasello
Arun Babu
...
Yossi Adi
Xiaohui Zhang
Wei-Ning Hsu
Alexis Conneau
Michael Auli
VLM
77
300
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
31
53
0
22 May 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech
  Pre-Training for Adaptation to Unseen Languages
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
Andrew Rouditchenko
Sameer Khurana
Samuel Thomas
Rogerio Feris
Leonid Karlinsky
Hilde Kuehne
David Harwath
Brian Kingsbury
James R. Glass
VLM
37
22
0
21 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
35
13
0
15 May 2023
Understanding and Bridging the Modality Gap for Speech Translation
Understanding and Bridging the Modality Gap for Speech Translation
Qingkai Fang
Yang Feng
29
25
0
15 May 2023
SLTUNET: A Simple Unified Model for Sign Language Translation
SLTUNET: A Simple Unified Model for Sign Language Translation
Biao Zhang
Mathias Müller
Rico Sennrich
SLR
43
33
0
02 May 2023
Understanding Shared Speech-Text Representations
Understanding Shared Speech-Text Representations
Gary Wang
Kyle Kastner
Ankur Bapna
Zhehuai Chen
Andrew Rosenberg
Bhuvana Ramabhadran
Yu Zhang
AuLLM
69
7
0
27 Apr 2023
Adaptive Knowledge Distillation between Text and Speech Pre-trained
  Models
Adaptive Knowledge Distillation between Text and Speech Pre-trained Models
Jinjie Ni
Yukun Ma
Wen Wang
Qian Chen
Dianwen Ng
Han Lei
Trung Hieu Nguyen
Chong Zhang
B. Ma
Erik Cambria
11
2
0
07 Mar 2023
End-to-End Speech Recognition: A Survey
End-to-End Speech Recognition: A Survey
Rohit Prabhavalkar
Takaaki Hori
Tara N. Sainath
Ralf Schluter
Shinji Watanabe
VLM
26
149
0
03 Mar 2023
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Google USM: Scaling Automatic Speech Recognition Beyond 100 Languages
Yu Zhang
Wei Han
James Qin
Yongqiang Wang
Ankur Bapna
...
Pedro J. Moreno
Chung-Cheng Chiu
J. Schalkwyk
Franccoise Beaufays
Yonghui Wu
VLM
79
253
0
02 Mar 2023
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
Improving Massively Multilingual ASR With Auxiliary CTC Objectives
William Chen
Brian Yan
Jiatong Shi
Yifan Peng
Soumi Maiti
Shinji Watanabe
39
38
0
24 Feb 2023
Efficient CTC Regularization via Coarse Labels for End-to-End Speech
  Translation
Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation
Biao Zhang
Barry Haddow
Rico Sennrich
17
3
0
21 Feb 2023
Pre-training for Speech Translation: CTC Meets Optimal Transport
Pre-training for Speech Translation: CTC Meets Optimal Transport
Hang Le
Hongyu Gong
Changhan Wang
J. Pino
Benjamin Lecouteux
D. Schwab
OT
13
20
0
27 Jan 2023
The Decades Progress on Code-Switching Research in NLP: A Systematic
  Survey on Trends and Challenges
The Decades Progress on Code-Switching Research in NLP: A Systematic Survey on Trends and Challenges
Genta Indra Winata
Alham Fikri Aji
Zheng-Xin Yong
Thamar Solorio
37
33
0
19 Dec 2022
Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models
Mu2^{2}2SLAM: Multitask, Multilingual Speech and Language Models
Yong Cheng
Yu Zhang
Melvin Johnson
Wolfgang Macherey
Ankur Bapna
33
8
0
19 Dec 2022
12
Next